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National   Standard  Reference   Data   System 

The  National  Standard  Reference  Data  System  is  a  government-wide  effort  to  give  to  the 
technical  community  of  the  United  States  optimum  access  to  the  quantitative  data  of  physical 
science,  critically  evaluated  and  compiled  for  convenience.  This  program  was  established  in 
1963  by  the  President's  Office  of  Science  and  Technology,  acting  upon  the  recommendation  of  the 
Federal  Council  for  Science  and  Technology.  The  National  Bureau  of  Standards  has  been  as- 
signed responsibility  for  administering  the  effort.  The  general  objective  of  the  System  is  to 
coordinate  and  integrate  existing  data  evaluation  and  compilation  activities  into  a  systematic,  com- 
prehensive program,  supplementing  and  expanding  technical  coverage  when  necessary,  establish- 
ing and  maintaining  standards  for  the  output  of  the  participating  groups,  and  providing  mecha- 
nisms for  the  dissemination  of  the  output  as  required. 

The  NSRDS  is  conducted  as  a  decentralized  operation  of  nation-wide  scope  with  central  co- 
ordination by  NBS.  It  comprises  a  complex  of  data  centers  and  other  activities,  carried  on  in 
government  agencies,  academic  institutions,  and  nongovernmental  laboratories.  The  independent 
operational  status  of  existing  critical  data  projects  is  maintained  and  encouraged.  Data  centers 
that  are  components  of  the  NSRDS  produce  compilations  of  critically  evaluated  data,  critical  re- 
views of  the  state  of  quantitative  knowledge  in  specialized  areas,  and  computations  of  useful 
functions  derived  from  standard  reference  data. 

For" operational  purposes,  NSRDS  compilation  activities  are  organized  into  seven  categories 
as  listed  below.  The  data  publications  of  the  NSRDS,  which  may  consist  of  monographs,  loose- 
leaf  sheets,  computer  tapes,  or  any  other  useful  product,  will  be  classified  as  belonging  to  one 
or  another  of  these  categories.  An  additional  "General"  category  of  NSRDS  publications  will 
include  reports  on  detailed  classification  schemes,  lists  of  compilations  considered  to  be  Standard 
Reference  Data,  status  reports,  and  similar  material.  Thus,  NSRDS  publications  will  appear  in 
the  following  eight  categories : 


Category 

1 
2 
3 
4 
5 
6 
7 
8 


Title 

General 

Nuclear  Properties 

Atomic  and  Molecular  Properties 

Solid  State  Properties 

Thermodynamic  and  Transport  Properties 

Chemical  Kinetics 

Colloid  and  Surface  Properties 

Mechanical  Properties  of  Materials 
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Information  Handling  in  the  National  Standard  Reference 

Data  System 

Franz  L.  Alt 

A  preliminary  plan  is  presented  for  the  selection,  acquisition,  intellectual  organization, 
and  storage  of  the  information  which  will  underlie  the  Information  Services  Operation  of 
the  National  Standard  Reference  Data  System,  as  well  as  for  methods  of  locating  desired 
information  items  in  storage,  retrieving,  and  displaying  or  communicating  them.  Questions 
of  the  use  of  computers  for  these  purposes  are  discussed,  including  selection  of  equipment, 
arrangement  of  digital  storage,  input  format,  remote  access,  and  the  economics  of  choosing 
certain  functions  of  the  system  for  mechanization.  Also,  an  interim  system,  based  on  con- 
ventional and,  in  the  main,  manually  operated  files,  is  described. 

Key  Words:  Computer-aided  inquiry  service,  data  retrieval,  file  mechanization,  infor- 
mation retrieval,  standard  reference  data. 

1.  Introduction 


1.1.  Plan  of  Approach 

1.1.1.  Development  of  NSRDS 

The  present  report  describes  proposed  plans  for 
e  handling  of  information  in  the  National 
andard  Reference  Data  System  (NSRDS).  It 
concerned  with  the  information  with  which  the 
stem  deals,  with  the  logical  organization  of  this 
formation,  its  acquisition  and  physical  storage; 
th  methods  of  locating  desired  information  items 
storage,  retrieving  and  displaying  or  com- 
anicating  them. 

It  is  expected  that  the  information  system  of 
3RDS  will  undergo  evolutionary  changes  for  a 
imber  of  years.  The  later  stages  of  this  devel- 
•ment  can  not  yet  be  foreseen  with  complete 
irity,    because    NSRDS    itself    is    developing. 

Estails  of  the  system  design  will  depend  on  certain 
rameters  (e.g.,  size  and  rate  of  growth  of  the 
ta  collection)  whose  eventual  values  are  not  yet 
town,  or  on  changes  in  technology  which  are 
tely  to  develop  during  the  next  few  years,  and 
l  the  solution  of  some  of  the  research  problems 
liich  will  be  undertaken  by  NSRDS  itself. 
Despite  all  this  uncertainty  about  the  future 
■velopment  of  the  information  system,  two  facts 
erge  which  have  been  adopted  as  basic  policy 
■cisions  and  which  underlie  all  other  decisions 
be  made:  (1)  Ultimately  the  NSRDS  informa- 
n  system  will  involve  the  use  of  large  electronic 
gital  computers  in  a  crucial  way  (though  not 
cessarily  to  the  complete  exclusion  of  manual 
ethods).  (2)  At  present  the  introduction  of 
ich  computers  into  the  principal  operations  of 
SRDS  would  be  premature. 
These  two  statements  are  plausible  in  them- 
lves  and  will  become  more  so  in  the  course  of 
scussing  the  economics  of  computer  operation 
section  2  of  this  report.  From  them  we  infer 
iimediately  that  we  should  distinguish  at  least 
vo  periods  in  the  operation  of  NSRDS  :  the  long- 
mge  future,  during  which  the  operation  will  be 
laracterized   by   computer    use;    and   the   near 


future,  which  will  have  a  different  regime — as  we 
shall  argue,  conventional  and  in  the  main  man- 
ually operated  files,  combined  with  information 
stored  in  human  minds. 

This  leaves  several  questions  still  to  be  answered. 
First,  it  may  appear  plausible  that  the  transition 
to  the  computer  system  should  be  gradual  or  in 
a  number  of  steps,  rather  than  all  at  once.  Sec- 
ond, one  may  ask  whether  there  should  be  one 
or  more  intermediate  periods,  characterized  by 
methods  which  differ  both  from  the  ultimate  large 
computer  system  and  the  initial  manual  file  sys- 
tem. Such  intermediate  methods  would  differ 
from  mere  transitional  steps  on  the  way  to  full 
computer  operation,  by  definition,  in  that  the 
former  call  for  some  investment  in  hardware  or 
procedures  which  would  be  retained  more  or  less 
without  change.  A  more  detailed  discussion  of 
these  questions  given  later  in  the  present  report 
will  favor  a  gradual  transition  from  manual  to 
computer  operation,  but  without  recourse  to  costly 
intermediate  techniques  that  would  be  discarded 
before  economic  use  is  made  of  the  investment. 

A  third  open  question  is  that  of  time.  A 
precise  prediction  of  how  soon  a  computer  can  be 
used  effectively  and  economically  is  not  possible 
at  present,  but  it  seems  likely  that  the  initial  man- 
ual information  system  will  remain  in  operation 
for  at  least  three  years,  possibly  longer.  The 
speed  of  phasing  from  a  preponderantly  manual 
to  a  preponderantly  machine  system  will  depend 
upon  the  activities  and  development  of  the  Tech- 
nical Data  Centers  and  other  as  yet  unforseeable 
factors.  At  each  stage  during  this  gradual  tran- 
sition there  should  be  ample  opportunity  for  us  to 
make  decisions  more  confidently  as  we  observe  the 
system  in  operation. 

Not  only  will  the  transition  to  machine  meth- 
ods be  gradual;  in  some  instances  it  should  not 
take  place  at  all.  For  example,  it  is  not  clear  that 
inquiries  can  be  answered  reliably  in  toto  by  a  ma- 
chine. We  may  find  that  machine  methods  can  be 
used  to  narrow  a  search  to  a  few  choices,  the  final 


selection  to  be  done  by  humans,  or  vice  versa,  hu- 
mans to  perform  a  preliminary  screening  and 
switching  of  inquiries.  .     . 

The  successive  systems  are  not  quite  independent 
of  each  other.  It  is  easy  to  see  that  a  choice  be- 
tween alternative  procedures  in  the  computer- 
based  system  might  be  influenced  by  the  way  in 
which  the  same  feature  has  been  handled  in  the 
preceding  (manual  or  intermediate)  system. 
Similarly,  and  more  importantly,  the  design  of  the 
initial  and  any  intermediate  system  should  prefer- 
ably avoid  anything  that  might  impede  transition 
to  the  most  desirable  form  of  the  ultimate  com- 
puter system.  In  order  to  facilitate  the  exposi- 
tion, we  therefore  propose  to  discuss  the  long- 
range  information  system  first,  and  the  short-range 
one  afterwards.  . 

Thus,  the  present  report  is  organized  as  follows. 
The  next  section  summarizes  the  main  results. 
The  remaining  portions  of  section  1  describe  the 
organization  and  functions  of  NSRDS  to  the  ex- 
tent needed  for  our  discussion  of  the  information 
system.  A  more  complete  description  has  been 
given  elsewhere,1  and  readers  familiar  with  it  may 
bypass  these  sections.  Next,  section  2  develops  the 
ultimate  computer-based  system  as  far  as  our  pres- 
ent ideas  go.  This  is  followed  in  section  3  by  a 
description  of  the  system  envisaged  for  the  next 
few  years,  and  finally  by  a  few  comments  on  the 
transition  between  the  two. 

We  have'  referred  to  the  long-range  system  as 
the  "ultimate"  one.  By  this  we  do  not  mean  that 
it  will  be  frozen  forever;  rather,  that  we  have 
taken  into  account  everything  that  we  can  foresee 
about  it  at  this  time.  The  system  will  undoubtedly 
undergo  further  changes,  but  they  must  be  disre- 
garded in  our  present  planning. 

We  envisage  NSRDS  not  as  an  entity  all  by  it- 
self but  as  one  of  a  number  of  information  activi- 
ties constituting  the  emerging  National  Scientific 
and  Technical  Information  System  currently  un- 
der study  by  COSATI.  In  particular,  we  en- 
deavor to  keep  NSRDS  compatible  with  the  infor- 
mation systems  of  AEC,  NASA,  DDC,  the  Clear- 
inghouse for  Federal  Scientific  and  Technical  In- 
formation, and  other  agencies. 

1.1.2.  Recommendations 

In  this  section  we  summarize  briefly  the  princi- 
pal results  of  the  study,  especially  as  they  lead  to 
recommendations  for  action.  It  has  already  been 
mentioned,  and  will  become  more  evident  in  later 
sections,  that  these  results  are  still  somewhat  tenta- 
tive. This  is  unavoidable,  in  view  of  the  uncer- 
tainty of  many  of  the  premises  on  which  they  are 
based.  The  best  that  can  be  done  at  this  time  is 
to  give  a  full  presentation  of  the  pros  and  cons  for 
each  of  the  major  decisions.  In  order  to  enable 
the  reader  to  find  this  information  selectively,  if 

1  E.  L.  Brady  and  M.  B.  Wallenstein,  National  Standard  Ref- 
erence Data  Hyntem — Plan  of  Operation.  NSRDS-NBS  1,  U.S. 
Government  Printing  Office,  1964. 


he  so  desires,  the  following  list  of  recommend; 
tions  is  cross-referenced  to  those  sections  of  th 
Note  in  which  he  may  find  a  discussion  of  unde ;  I 
lying  assumptions,  facts,  and  arguments  and|_ 
where  applicable — of  alternatives  which  were  co  i 
sidered.    It  will  become  clear  that  many  of  the  I 
recommendations,  especially  those  intended  for  ii 
plementation  several  years  hence,  are  subject  | 
change  in  the  interim,  if  such  change  should  I 
come   advisable,  e.g.,  through  new   informal  i 
about  the  availability  and  rate  of  generation  :  q 
data,  about  the  operation  of  data  centers,  abo-i 
performance  and  cost  of  computers,  and  the  opt 
ating  experience  of  the  Office  of  Standard  Refff 
ence  Data  itself. 

Svmvmary  of  Recommendations 

1.  A  conventional  manual  data  file  system  11 
the  near  future.     (1.1,  3) 

2.  A  system  based  on  a  digital  computer  for  t 
more  distant  future.     (1.1,  2) 

3.  Begin  rendering  services  using  System  (1) 
once.     (1.1,  3.2.1,  4) 

4.  Plan  on  the  bulk  of  System  (2)  being  imp 
mented  in  3  to  5  years.     (1.1) 

5.  Transition  from  (1)  to  (2)  in  steps,  but  wi1| 
out  major  investment  in  any  temporary  interme  j 
ate  system.     (1.1,4) 

6.  Aim  at  man-machine  cooperation  rather  th 
at  complete  mechanization  of  information  i 
trieval.     (2.1.3, 2.5.1) 

7.  Share  time  on  a  large  general-purpose  co 
puter  operated  by  NBS.     (2.1.4,  2.5.3) 

8.  Obtain  an  external  storage  unit  of  about  | 
million  words  capacity,  with  a  transfer  rate  of 
least  1  to  2  megabits  per  second,  for  exclusive  v 
by  OSRD.     (2.1.5,  2.4.5) 

9.  Establish  in  the  office  of  OSRD  a  console  : 
direct  on-line  access  to  the  computer.  (2.J 
2.5.1) 

10.  At  a  later  stage,  similar  consoles  should, 
available  throughout  the  country,  for  connect1 
to  the  computer  by  long-distance  telepho 
(2.1.5,  2.5.2) 

11.  Use  of  the  computer  to  include  informat 
retrieval,  file  updating,  aid  to  publication  (edit:, 
and  typesetting) ,  and  housekeeping.     (2.2) 

12.  OSRD  to  devise  standard  formats  for  k 
punching  of  data,  e.g.,  into  80-column  punclj 
cards,  these  formats  to  be  observed  as  far  as  p 
sible  by  OSRD  and  all  Data  Centers.     (2.3.4) 

13.  The  burden  of  keypunching  by  OSRD  to 
relieved  by  using  machinable  data  punched 
other  sources,  or  for  other  purposes,  and  data  g 
erated  by  computers  or  automatic  print  read) 
(2.3.3) 

14.  Data  in  the  master  file  to  be  arranged  i 
properties  or,  more  generally,  by  homogene| 
groups  of  related  properties.     (2.4.2)  i 

15.  Functions  of  one  or  more  variables  to  be  r  I 
resented,  where  appropriate,  by   approximat 


■ 


r 


Eblynomials    (or    other    series)    using   maximal 

I  Intervals.  (2.4.3) 
3  16.  Use  of  the  computer  to  be  in  "batch"  mode 
henever  possible;  remote  on-line  access  to  com- 
uter  from  OSRD  when  necessary  for  efficient 
ian-machine  interaction.  (2.5.1) 
i  17.  Future  results  of  critical  evaluation  of  data 
>  be  assembled  in  a  separate  file  in  OSRD.  When 
o  standard  reference  data   available,  data  in- 

1  Juries  to  be  answered  with  best  data  in  the  litera- 
are,  with  suitable  disclaimer.     (3.2.1,  3.2.5) 

18.  Initially,  a  large  fraction  of  all  inquiries 
<?ceived  will  be  referred  to  experts;  number  of 
eferrals  should  gradually  drop  but  not  to  zero. 
3.2.2) 

19.  Technical  area  managers,  Data  Centers, 
IBS  scientists  and  occasionally  others  to  be  used 
s  experts  for  replying  to  inquiries.     (3.2.2) 

20.  Use  of  citation  indexes,  bibliographic  cou- 
ling,  and  other  aids  to  literature  referencing  to 
e  explored.     (3.2.3) 

21.  Graphical  information  to  be  handled  by 
omputer-controlled  curve  plotter,  or  if  this  is 
ot  possible,  by  microfiche.     Control  of  the  latter 

if  ystem  by  central  digital  computer  to  be  studied. 
3.2.4) 

22.  Publication  of  NSRD  Series  to  be  con- 
inued ;  a  periodical,  especially  with  bibliographic 

Information,  to  be  considered  later.     (3.3.1) 
1  23.  Computer  aid9  to  publication  to  be  con- 
inued,   and   further  development   especially   of 
diting  codes  to  be  pushed.     (3.3.2) 
m  24.  Needs  of  potential  users  as  to  frequency, 
ypes,  and  form  of  information  to  be  established 
'through  user  surveys.     (3.4.1) 
W   25.  Concurrently  with  the  start  of  information 
'Services,  OSRD  to  broaden  its  information  by  a 
tibliographic  survey  of  existing  data  compila- 
■  -  ions,  and  by  a  questionnaire  to  prospective  users. 
!.[■  3.4.2) 

26.  Establishing  a  small  thesaurus  of  subject 
d  index  terms,  indexing  the  present  OSRD  library 
to  ollection,  and  abstracting  of  library  books  to  be 
it 'Pursued  in  this  order,  and  concurrently  with  in- 

luiry-answering  service.  Library  personnel  to  be 
it-ised  in  inquiry  answering  in  order  to  gain  experi- 
t'  nee.     (3.4.3,  3.4.4) 

27.  Mechanization  of  a  small  part  of  the  col- 
lection to  be  attempted  soon.     (4) 


1.2.  Organization  of  NSRDS 


j  Physical  properties  of  materials,  which  are  the 
[  subject  matter  with  which  NSRDS  deals,  have 

)een  divided  into  a  number  of  technical  areas.  To 
.^late  seven  such  areas  have  been  defined:  (1)  nu- 
clear data,  (2)   atomic  and  molecular  data,   (3) 

;olid  state  data,  (4)  thermodynamic  and  transport 
1  properties,  (5)  chemical  kinetics,  (6)  colloid  and 
W surface  properties,  and  (7)  mechanical  properties. 

Dther  areas  may  be  added  later,  but  these  seven 
rnseem  to  come  close  to  exhausting  our  present 
iti  concern. 


The  organization  dealing  with  these  subjects 
consists  of  the  Office  of  Standard  Reference  Data 
(OSRD),  located  at  the  National  Bureau  of  Stand- 
ards, and  a  number  of  Technical  Data  Centers, 
mostly  outside  of  NBS.  Many  of  these  Data  Cen- 
ters are  sponsored  and  operated  by  other  agencies ; 
some  antedate  the  existence  of  NSRDS.  A  few 
are  located  at  NBS,  and  some  operate  elsewhere 
under  contract  with  NBS.  Each  Data  Center  has 
cognizance  over  a  certain  domain,  usually  falling 
within,  but  narrower  than,  one  of  the  seven  tech- 
nical areas.  The  domain  of  a  technical  Data  Cen- 
ter may  be  characterized  by  a  set  of  physical 
properties  (e.g.,  infrared  spectra)  or  materials 
(e.g.,  metals)  or  occasionally  of  other  criteria  (e.g., 
low  temperature),  or  by  a  combination  of  such 
criteria.  The  designation  of  certain  organizations 
as  Technical  Data  Centers  of  NSRDS,  the  delim- 
itation of  their  scope,  and  coordination  of  their 
activities  are  among  the  responsibilities  of  OSRD. 
In  addition  there  are  data  compilation  projects 
directly  under  the  cognizance  of  OSRD. 

It  is  recognized  that  the  presently  existing  Data 
Centers  cover  only  a  small  part  of  the  entire 
domain  of  standard  reference  data.  It  is  desirable 
that  new  centers  should  be  established,  or  old  ones 
expanded,  at  a  rapid  rate.  It  must  be  recognized, 
however,  that  even  in  the  best  of  circumstances 
it  will  take  years  before  Data  Centers  will  even 
approach  complete  coverage  of  the  entire  field  of 
standard  reference  data,  and  it  is  unlikely  that 
such  completeness  will  ever  be  attained.  As  a 
result,  a  larger  burden  will  have  to  be  placed  on 
OSRD,  at  least  initially. 

In  particular,  OSRD  will  have  to  engage  in  a 
survey  of  existing  data  compilations  in  all  fields, 
which  can  be  used  as  a  basis  for  information  serv- 
ices until  a  better  foundation  is  furnished  by  the 
Data  Centers.  It  will  also,  in  some  cases,  contract 
with  organizations  or  individual  scientists  for  pro- 
ducing compilations,  critical  evaluation  and  re- 
views, computation  of  certain  useful  functions 
derived  from  standard  reference  data,  and  even 
experimental  measurements.  All  these  activities 
can  be  provided  by  OSRD  as  opportunities  offer 
themselves,  but  a  more  systematic  and  exhaustive 
coverage  of  the  field  will  depend  on  the  expansion 
of  the  Data  Center  system. 

Within  OSRD  there  is  an  Area  Manager  for 
each  of  the  seven  technical  areas,  plus  one  for  in- 
formation system  design  and  research;  in  addition, 
OSRD  operates  an  Information  Service  at  NBS. 
In  a  certain  sense,  the  organization  and  activities 
of  this  Information  Services  Operation  (ISO)  are 
the  main  subject  of  the  present  report,  although 
some  relevant  questions  about  Data  Center  activi- 
ties will  also  have  to  be  discussed. 

ISO  is'  expected  to  consist  of  four  units,  con- 
cerned with  (1)  compilation-production  services, 
(2)  inquiry  services,  (3)  the  data  file  operation, 
and  (4)  analysis  and  user  relations. 


1.3.  Information  Services 

1.3.1.  General 

It  is  the  responsibility  of  the  Information  Serv- 
ices Operation  "to  provide  the  services  to  the 
technical  community  that  are  determined  to  be 
useful  and  maintain  the  collection  of  data  that 
will  constitute  the  data  center  at  the  National 
Bureau  of  Standards— indexing,  filing,  storing, 
and  retrieving  data  as  required." 

We  distinguish  two  kinds  of  services :  scheduled 
and  nonscheduled.  It  is  too  early  to  make  a  firm 
estimate  of  the  relative  size  of  these  two  kinds  of 
information  activities.  Indications  are  that  they 
will  be  of  comparable  magnitude. 

1.3.2.  Compilation-Production  Services 

Scheduled  services  include  the  dissemination  of 
information,  either  periodical  or  occasional,  on 
our  own  initiative.  It  is  expected  that  some  of 
this  will  be  handled,  as  it  is  now  in  a  few  cases, 
by  the  technical  data  centers,  with  or  without  as- 
sistance from  OSRD.  The  central  information 
service  operation  will  not  duplicate  any  of  these 
efforts  but  will  attempt  to  provide  additional 
publications  covering  the  field  of  reference  data 
in  general,  or  cutting  across  the  lines  of  several 
data  centers,  or  falling  between  them.  A  periodi- 
cal current  awareness  service  is  one  of  the  likely 
activities  contemplated  for  ISO.  Preparation  of 
revised  editions  of  data  handbooks  is  an  example 
of  activities  of  the  technical  data  centers. 


In  general,  the  publication  of  monographs,  a? | 
the  primary  means  of  supplying  data  to  users,  is 
perhaps  the  most  important  single  function  o1 
NSRDS.  Such  monographs,  while  concentrating 
on  the  tabulation  of  critically  evaluated  data,  wil 
in  addition  contain  such  relevant  information  oi 
the  generation  and  application  of  the  data  as  i 
likely  to  be  helpful  to  the  user  (cf.  sec.  3.3.1) 

1.3.3.  Inquiry  Services 

We  may  visualize  four  kinds  of  action  taken  i: 
response  to  requests  for  information :  (a)  referral 
(b)  reference;   (c)  documentation;   (d)  data  in- 
formation.   They  are  increasingly  specific  in  th 
order  in  which  they  are  listed.    Referral  mearij 
that  the  question  is  referred  to  another  organize 
tion,  generally  one  of  the  technical  data  center 
though    occasionally    an    organization    outsid 
NSRDS.     It  is  expected  that  the  requestor  woul 
be  informed  of  the  referral.    The  reply  may  I 
sent  to  the  requestor,  preferably  via  OSRD.    B 
reference  is  meant  a  listing  of  relevant  literature 
Documentation  goes  one  step  further  and  includ* 
furnishing  of  micro-stored  or  hard  copies  of  tl 
referenced  literature.    "Data  information"  impli<> 
furnishing  not  only  a  listing  of  the  requested  dat 
but  also  any  necessary  explanation,  caution,  et 
(cf.  sec.  3.2). 

The  choice  among  these  actions  must  be  ke] 
flexible  at  all  times.     It  would  be  undesirable 
decide  that  OSRD  will  furnish  only  one  kind 
reply. 


■ 


2.  Mechanized  Information  System  for  NSRDS 


2.1.  General  Considerations 

2.1.1.  Philosophy  of  Computer  Selection 

In  this  section  we  discuss  the  advantages  and 
drawbacks  of  mechanization  primarily  in  the  light 
of  their  concrete  effects,  rather  than  of  their  im- 
ponderable consequences. 

The  use  of  digital  computers  for  information 
retrieval  is  one  of  the  most  widely  discussed  is- 
sues in  science  administration  today.  A  number 
of  entire  new  organizations  are  being  set  up  for 
this  purpose,  and  large  vested  interests  are  at  play. 
In  such  a  situation  one  is  easily  tempted  into  extra- 
neous considerations:  having  a  computer  is  con- 
sidered "good  advertising,"  it  lends  an  appearance 
of  progress  and  importance  to  an  organization,  it 
attracts  prospective  customers.  We  propose  to 
resist  these  temptations  and  to  examine  the  possi- 
ble uses  of  computers  in  NSRDS  strictly  on  their 
merits.  Accordingly,  we  will  compare  the  cost  of 
computer  versus  manual  operation  and  the  cost 
differences  among  different  computer  types;  we 
will  examine  the  advantages  in  speed  and  ease  of 
distant  access  to  a  mechanized  file,  and  the  possible 
loss  in  quality  and  convenience  of  direct  access  to 
a  manual  file  kept  on  the  spot.    We  must  keep  in 


mind,  however,  that  an  analysis  based  on  these  fa 
tors  alone  is  likely  to  understate  the  value  of  aut  j 
mation.     Experience  in  other  fields  has  shown  th 
the  introduction  of  computer  methods  is  often  f < 
lowed  by  unforeseeable,  or  at  least  unforesee, 
rapid  progress  in  other  respects.     Noteworthy  e 
amples  are  in  the  analysis  of  x-ray  crystal  I 
graphic  data  and  of  bubble  chamber  observatioi  I 
where  the  use  of  computers  was  followed  by  au(  j 
mation  of  data  acquisition  and  has  led  to  a  mail- 
fold  expansion  in  the  amount  of  scientific  inti- 
mation obtained  by  these  methods. 

At  present  there  is  not  enough  informati,:i 
available  to  enable  us  to  make  a  quantitati|3y 
study  of  the  cost  and  performance  of  computers  p 
be  used  several  years  from  now.  We  have  to  r<| 
on  some  general  and  qualitative  observations  a  i 
experience  in  other  fields,  but  there  will  be  ti  e 
to  verify  the  findings  so  obtained  before  comn  - 
ting  ourselves  definitely  to  one  or  another  cou  e 
of  action.  The  arguments  to  be  presented  in 
next  two  sections  give  qualitative  support  to  « 
contention  that  a  digital  computer  will  ultimat  ;y 
be  economical  for  NSRDS.  In  addition  we  si  II 
argue  that  sharing  time  on  a  large  compute]  if 
preferable  to  operating  a  smaller  computer   I 
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SKDS  alone.  Finally,  rather  than  insisting  on 
j  )inplete  mechanization  of  every  phase  of  the 

locess,  we  shall  aim  at  an  optimal  division  of  la- 
j,  or  between  man  and  machine. 


isi 


2.1.2.  Arguments  in  Favor  of  Mechanization 

(a)  Size  of  collection.  There  is  as  yet  no  re- 
able  basis  for  estimating  the  magnitude  of  our 
iformation  collection.  For  planning  purposes 
e  are  using  an  order  of  magnitude  of  100  m-llion 
ords.  (See  sec.  2.2.1.)  It  seems  plausible  to 
s  that  the  ultimate  size  may  differ  from  this  esti- 
iiate  by  possibly  a  factor  of  10,  but  probably  not 
y  a  factor  of  100,  in  either  direction.     It  should 

fe  several  years  before  we  get  to  figures  of  this 
wjllagnitude.     Size  alone  is  rarely  a  sufficient  rea- 
.   :  >n  for  automation,  unless  it  gets  to  be  truly  exces- 
ve;  if  a  collection  of  100  million  words  were  to 
?  used  only  in  the  way  in  which  one  uses,  say,  a 
I  -lephone  directory  or  library  card  file,  mechani- 
ition  would  not  be  worthwhile.     Taken  together 
r  ith  the  following  arguments,  however,  the  esti- 
mated size  is  an  added  consideration  in  favor  of 
;k:techanization. 

(b)  Mechanization  becomes  advantageous  when 
lere  are  frequent  occasions  for  searching  through 
le  entire  file  or  large  portions  of  it.    An  example 

the  search  for  materials  whose  boiling  points  lie 

stween  given  limits.    Whether  or  not  searching 

required  depends  on  the  way  in  which  the  file  is 

-  rganized ;  for  example,  the  telephone  directory 

not  properly  organized  for  finding  people  living 

n  a  given  street.     Indexing  is  a  method  of  organ- 

ing  a  file  so  as  to  allow  the  answering  of  certain 

.rpes  of  questions  without  a  major  file  search.     It 

eems  extremely  unlikely  that  we  should  be  able  to 

aicipate  all  our  information  needs  by  measures 

1  f  this  kind,  and  therefore  file  searches  will  be 

|  eeded  from  time  to  time.     It  is  impossible  at  pres- 

;  it  to  estimate  their  frequency,  but  the  argument 

•nds  weight  to  the  demand  for  mechanization. 

1  (c)  Updating  a  file,  correction  of  errors,  addi- 

,:'^!jfon  of  new  results,  etc.,  are  greately  facilitated  by 

• '  lechanization.    This  consideration  has  prompted 

°'' s,  for  example,  to  use  cards  and  machine  methods 

1  f  type  composition  in  preparing  the  next  edition 

mw  "Crystal  Data  Tables."     It  is  likely  that  the 

'  rtra  cost  incurred  will  be  more  than  offset  by  sav- 

:  lgs  in  preparing  the  first  subsequent  edition. 

(d)  Methods  of  bibliographic  coupling  and  of 
itation  referencing  are  greatly  aided  by  mechani- 
xtion.     Arguments  will  be  advanced  in  section 

-  2.4  below  to  show  that  these  methods  are  of  par- 
ailar  importance  for  the  Standard  Reference 

be  ti(i)i)ata  Program. 

(e)  The  operation  both  of  OSRD  and  of  the 
ata  centers  will  be  facilitated  by  remote  access 

in '  o  the  file,  which  in  turn  presupposes  mechaniza- 
•ttojjion.    More  will  be  said  about  this  below.     (See 
'  ecs.  2.5.1,  2.5.2.) 
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(f)  In  some  of  the  data  centers,  the  introduc- 


el  ion  of  machine  methods  will  be  aided  by  the  fact 

pater 


that  some  of  the  experimental  measurements  used 
in  the  generation  of  data  are  set  up  for  automatic 
recording  and  digital  encoding  of  results. 

2.1.3.  Obstacles  to  Mechanization 

The  introduction  of  computers  into  the  process 
of  information  retrieval  is  hindered  by  the  same 
difficulty  which  characterizes  most  other  computer 
applications :  our  present  inability  to  give  a  rigor- 
ous description  of  the  procedure  which  the  com- 
puter is  to  follow.  In  human  information  re- 
trieval, e.g.,  in  a  library,  not  only  is  the  memory  of 
the  librarian  an  important  tool,  but  every  user  of 
a  library  uses  clues,  lines  of  reasoning,  and  other 
mental  processes  of  which  he  himself  is  not  aware. 
Such  an  imperfectly  formulated  procedure  is  well 
adapted  to  the  human  mind  but  is  of  no  help  with 
computers. 

There  are  two  ways  to  overcome  our  lack  of  un- 
derstanding of  the  problem.  One  is  a  program  of 
research  into  the  formulation  of  the  information 
retrieval  problem.  For  this,  in  turn,  there  are  two 
alternatives :  investigate  and,  if  possible,  formalize 
the  customary  human  procedure;  or  develop  new 
methods  which  are  more  suitable  for  machines.  It 
is,  of  course,  not  true  that  the  computer  would 
have  to  use  the  same  procedure  as  is  used  by  hu- 
mans; but  it  has  to  use  some  procedure  which  is 
completely  formulated,  and  one  way  to  formulate 
it  is  to  start  with  the  familiar  human  procedure : 
try  to  make  explicit  the  mental  processes  which  we 
use  without  being  aware  of  them,  see  whether  they 
contain  any  elements  which  can  be  formalized,  and 
if  so,  translate  them  into  computer  programs.  If 
this  is  possible,  it  may  be  preferable  to  inventing  an 
entirely  new  and  untried  approach  to  the  problem. 

The  other  way  out  is  partial  mechanization : 
limit  the  use  of  the  computer  to  those  fragments 
of  the  whole  problem  for  which  a  rigorous  for- 
mulation can  readily  be  found,  and  leave  the  rest  to 
humans  as  before.  In  this  case  one  has  to  pay 
special  attention  to  the  interaction  between  man 
and  machine,  to  the  smooth  transfer  of  informa- 
tion from  one  to  the  other.  This  brings  up  another 
problem  possibly  as  formidable  as  the  first :  often 
input  and/or  output  are  the  principal  bottlenecks 
in  automatic  computation.  Indeed,  there  are  clas- 
sical cases  of  systems  in  which  the  introduction  of 
computers  was  not  profitable  until  all  functions  of 
the  system  had  been  mechanized,  thus  reducing  the 
relative  magnitude  of  input  and  output.  There 
are  other  examples  in  which,  on  the  contrary,  the 
use  of  computers  was  made  uneconomical  by  the 
attempt  at  complete  mechanization,  resulting  in 
excessively  complicated  computer  programs;  a 
more  modest  approach,  limited  to  the  most  fre- 
quently required  functions  of  the  system,  gave  most 
of  the  benefits  for  a  small  fraction  of  the  cost. 

Thus,  a  reasoned  choice  between  the  two  possible 
approaches  must  be  made.  The  decision  does  not 
necessarily  have  to  go  all  one  way;  compromises 
are  possible.    For  the  information  system  which 
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we  are  considering  here,  it  seems  likely  that  some 
functions  should  be  reserved  to  humans  for  a  long 
time  to  come,  possibly  forever.  We  are  thinking 
especially  of  the  checking,  screening,  and  editing 
of  output,  and  of  the  formulation  of  questions  by 
successive  steps.  The  latter  problem  may  be 
viewed  as  that  of  assigning  a  sufficient  number  of 
pockets  for  information  to  assure  that  no  single 
pocket  contains  more  than  a  few  items.  Retrieved 
information  would  then  be  in  the  form  of  a  pocket 
of  information  or  a  small  number  of  such  pockets. 
Resolution  beyond  this  point  by  machine  methods 
alone  may  not  be  economical. 

On  the  other  hand,  the  problems  in  man-machine 
communication  which  these  functions  generate, 
although  they  are  severe,  do  not  seem  insurmount- 
able. Indeed,  we  envision  that  the  human  user 
will  interfere  repeatedly  in  the  computer  process, 
aided  by  conveniently  designed  facilities  for  insert- 
ing instructions  and  for  display  of  intermediate 
computed  results.  Such  collaboration  or  dialogue 
between  user  and  machine  is  expected  to  be  a  char- 
acteristic feature  of  our  information  system. 

2.1.4.  Size  and  Location  of  Computer 

Anyone  in  need  of  the  services  of  a  computer 
should  first  examine  whether  to  acquire  a  com- 
puter exclusively  for  his  own  use  or  to  obtain  time 
on  somebody  else's  computer.  (We  may  dis- 
regard the  third  alternative,  of  acquiring  a  com- 
puter  and  making  some  of  its  time  available  to 
outsiders.)  For  prospective  users  in  the  Federal 
government  such  an  examination  is  specifically 
prescribed  by  the  Bureau  of  the  Budget. 

Of  the  various  uses  of  the  computer  in  the  work 
of  NSRDS,  probably  the  most  demanding  require- 
ment is  the  retrieval  of  specific  item9  of  informa- 
tion, or  sets  of  such  items,  in  response  to  requests. 
It  will  be  seen  that  other  uses — e.g.,  in  updating 
the  information  file,  assistance  in  publication, 
housekeeping — are  les9  exacting  and  fit  well  into 
a  system  designed  specially  for  the  information 
retrieval  functions. 

In  a  typical  problem  in  this  area,  a  portion  of 
the  computer-stored  information  file  is  read  and 
each  item  in  the  file  compared  with  the  question 
being  asked,  to  see  whether  the  file  item  is  relevant 
to  the  question.  On  a  large  fast  computer  only 
a  few  microseconds  are  required  for  each  com- 
parison; for  those  few  items  which  are  found  to 
be  relevant,  a  longer  process  of  evaluation  and 
output  takes  place.  Time  is  saved  if  a  number 
of  questions  are  treated  simultaneously.  One 
may,  for  example,  collect  the  questions  arriving 
in  the  course  of  a  day  into  one  batch  and  answer 
them  in  a  single  computer  run.  Each  question 
needs  to  draw  on  only  a  certain  portion  of  the 
information  file,  and  for  each  portion  of  the  file 
there  will  be,  on  the  average,  a  small  number  of 
questions  to  be  considered. 

There  are  so  many  details  as  yet  unknown  that 
we  can  obtain  at  best  an  order-of -magnitude  esti- 


mate of  computer  time  involved.     Computers  ofj 
the  incoming  generation  take  about  1  microsecond 
per  (logical)  instruction.    If  each  word  from  thej 
file  is  compared  with  an  average  of  5  questions, 
and  each  comparison  takes  4   instructions,  thm 
computer  will  spend  20  microseconds  per  words' 
with  a  file  of,  say,  100  million  words  the  daily] 
computer  time  would  be  2000  seconds.     We  shalli 
see  that  the  rate  of  transferring  information  from 
store  to  central  processor  can  just  keep  up  withH 
the  speed  of  computation.     If  a  smaller  and  slower 
computer  were  used,  it  could  be  kept  busy  full 
time.     (On  the  Bureau's  present  computer,  IBM 
7094,  the  time  would  be  something  less  than  2  hours 
per  day. )     For  supporting  figures  see  section  2.4.5^ 

In  the  choice  between  large  and  small  com- 
puters, the  large  ones  are  normally  less  expensive; 
typically,  their  hourly  cost  might  be  greater  by  a 
factor  of  ten,  their  output  by  a  factor  of  lOOl 
Small  computers  can  be  justified  only  on  grounds 
other  than  cost,  e.g.,  convenience  of  access,  oi 
mismatch  between  internal  speed  of  a  large  com- 
puter and  rate  of  input  or  output.  No  serious 
argument  of  this  kind  appears  to  be  valid  in  our' 
case,  as  will  be  seen  in  section  2.4.5.  On  the  other 
hand,  there  is  a  further  argument  in  favor  of  a 
late- vintage  large  computer:  OSRD  is  expected 
to  lead  the  way  in  applying  and  demonstrating 
improved  ways  of  retrieving  standard  reference! 
data;  use  of  front-line  equipment  offers  many 
opportunities  for  exercising  such  leadership. 

If  OSRD  is  to  use  a  large  fast  computer,  it  will,  I 
in  the  foreseeable  future,  do  so  by  obtaining  com- 1- 
puter  time  from  a  laboratory  operating  such  { 
computer,  since  its  own  needs  would  not  keep  sue! 
a  computer  fully  occupied.  This  leads  to  the  ques 
tion  whether  computer  programming  service 
should  likewise  be  obtained  from  another  labora 
tory,  or  whether  OSRD  should  build  up  its  owi 
programming  staff.  To  this  question  we  do  no 
have  a  clear-cut  answer  at  present.  There  is  ai 
increasing  trend  toward  professional  specializatioi 
in  programming,  as  a  result  of  which  a  computa 
tion  laboratory  is  in  a  better  position  to  select  am 
train  competent  programmers,  keep  them  full; 
employed  in  their  special  field,  and  offer  them  pro 
fessional  advancement.  In  an  organization  lik 
OSRD,  a  professional  programmer  is  intellectual!  I 
isolated,  faced  with  a  fluctuating  workload  of  wha  | 
to  him  are  "odd  jobs."  On  the  other  hand,  som 
aspects  of  computer  programming  demand  int: 
mate  familiarity  with  its  data  file  organizatioi 
and  benefit  from  the  devotion  of  a  staff  whose  fir.' 
loyalty  is  to  OSRD.  Another  approach  is  to  ha\ 
computer  programs  written  by  teams  consistin 
of  people  from  both  organizations — profession; 
programmers  from  the  computation  laboratoi 
teamed  with  data  specialists  with  detailed  prr 
gramming  experience  from  OSRD ;  the  "interface 
or  communication  link  between  these  people  migl 
consist  of  flow  charts  and  data  sheets. 
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In  the  absence  of  a  strong  argument  to  the  con- 
t  iv,  it  seems  natural  that  OSRD  should  use  the 
I  j.e   general-purpose  computer  expected  to  be 
liable  at  NBS.    While  remote  access  to  com- 
ters  over  long  distances  is  increasingly  coming 
(o  use,  it  is  economically  limited  to  certain  spe- 
1  situations — very  short  problems,  or  frequently 
?ded  large  problems  which  remain  unchanged 
years,  so  that  the  program  and  any  tabular  data 
p  permanently  stored  at  the  computer  site.     In 
my  ways  close  proximity  to  a  computer  is  ad- 
ntageous,  since  we  envisage  the  OSRD  operation 
gradually  evolving,  with  frequent  changes  in 
iputer  programs  and  additions  of  large  quanti- 
s  of  data.     Such  operations  are  facilitated  by 
bonal  contact  of  machine  operators,  program- 
,'rs  and  users,  by  hand-carrying  of  large  card 
cks,  and  sometimes  by  the  user's  ability  to  influ- 
pe  the  policies  of  the  computation  laboratory — 
of  which  argues  in  favor  of  using  the  general- 
jirpose  NBS  computer. 

The  same  reasons  can  be  advanced  against  the 

;Oposal  that  OSRD  join  with  some  or  all  of  the 

j  hnical  data  centers  of  NSRDS  for  the  establish- 

|  jlnt  of  a  common  computer  laboratory.    An  addi- 

nal  argument  against  such  a  plan  is  the  political 

ficulty  of  reaching  agreement  with  the  different 

|ta  centers,  many  of  whom  have  their  own  vested 

erests  in  computing  laboratories  located  at  their 

filiations. 

There  is  little  likelihood  that  OSRD  would  out- 
jyjj  fpw  the  sharing  arrangement.  Present  estimates 
.  m  picate  that  a  few  years  from  now  OSRD  will  use 
i  , small  fraction  of  the  time  available  on  present- 
,_,  |y  computers.  Even  if  OSRD's  requirements 
[m  -ould  eventually  grow  far  beyond  this  estimate, 
JL  ;is  likely  that  computers  will  also  have  become 
y  Ach  faster  by  that  time. 

2.1.5.  Computer  Characteristics 

lo  i ; 

_;  a  ,If ,  as  proposed  in  the  preceding  section,  OSRD 

atio  iares  in  the  general-purpose  computer  of  NBS, 

upnti  ^ovision  will  have  to  be  made  for  certain  pecu- 

'.  ji  '  rities  of  the  OSRD  operation. 

ft]]  }The  amount  of  data  to  be  stored  for  OSRD  is 

r .,  )  great  that  it  is  impractical  to  keep  them  on 

ji  ?rds  or  tape  and  read  them  into  the  computer 

I  jr  each  run  anew.     In  this  respect  OSRD  is,  and 

.    obably  will  remain,  unique  among  NBS  com- 

''*    ter  users.     It   will  be  necessary  to  acquire  a 

u   /urate  storage  component,  which  would  be  pur- 

ased  or  rented  by  OSRD  and  reserved  for  its 

atrplusive  use,  and  which  would  be  connected  to 

nnmi™tar    main     frame.     Suitable 
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vices  are  now  commercially  available  from  sev- 

\\  manufacturers.     A  secondary  question  which 

■:0D"u  be  resolved  later  is  whether  this  storage  unit 

ould  also  contain  the  program  instructions — an 

'K  rangement  which  is  probably  economical  but 

ijssibly  in  conflict  with  the  operating  system  of 

'■  i  computer  (cf.  sees.  2.4.4,  2.4.5). 


It  will  be  mandatory,  or  at  least  highly  desir- 
able, to  have  facilities  for  remote  on-line  access 
to  the  central  computer.  A  console  should  be 
located  in  the  OSRD  offices  (and  there  will  prob- 
ably be  a  demand  for  a  number  of  similar  consoles 
elsewhere  in  the  Bureau)  from  which  a  user  can 
contact  the  central  computer,  wait  for  the  end  of 
the  current  problem  (or  in  the  case  of  a  long  prob- 
lem, interrupt  it) ,  read  into  the  computer  a  small 
amount  of  instructions  and  data,  have  the  instruc- 
tions executed  and  immediately  see  a  small  volume 
of  results.  Large-volume  output  would  remain  on 
tape  in  the  computer  room  and  be  available  to  the 
user  there.  The  program  should  have  access  to 
routines  and  data  permanently  stored  in  the  com- 
puter's internal  memory,  and  should  be  able  to 
connect  the  computer,  under  its  own  control,  to 
tape  stations  and  special  external  storage  devices 
such  as  the  one  postulated  in  the  preceding  para- 
graph (cf.  sec.  2.5.1). 

It  will  also  be  desirable  to  have  the  ability  to 
use  similar  remote  stations  in  distant  cities,  using 
commercial  telephone  lines.  This  will  enable  in- 
dividual scientists  and  engineers  to  obtain  needed 
data  and  related  bibliographic  information  with 
a  minimum  of  effort  and  time  loss  (cf .  sec.  2.5.2) . 

In  order  to  enable  OSRD  to  be  compatible  with 
the  Technical  Data  Centers  and,  in  many  cases, 
introduce  recommendations  for  common  practice 
for  their  information  handling,  the  NBS  computer 
should  be  able  to  accept  programs  written  in  the 
more  widely  used  programming  languages,  espe- 
cially those  standardized  by  the  American  Stand- 
ards Association.  The  role  of  NSRDS  will  be 
facilitated  if  the  NBS  computer  is  of  a  kind  com- 
mercially available  throughout  the  United  States. 

Finally,  as  we  have  already  said,  the  computer 
should  be  in  the  front  line  of  development  of  large, 
fast,  and  powerful  computers,  in  order  to  enable 
OSRD  to  discharge  its  responsibility  of  establish- 
ment of  standards  of  quality,  methodology  includ- 
ing machine  processing  formats  and  such  other 
functions  as  are  required  to  ensure  the  compat- 
ibility of  all  units  of  the  NSRDS. 

The  development  of  special-purpose  computers 
for  information  retrieval  will  have  to  be  watched. 
It  is  not  yet  clear  whether  current  efforts  in  this 
direction  will  be  successful  nor,  if  they  are,  to 
what  extent  their  novel  features  will  come  to  be 
included  in  future  general-purpose  computers. 

2.2.  Functions  of  the  System 

2.2.1.  Information  Retrieval 

Under  this  heading  we  consider  the  reaction  of 
the  system  to  (unscheduled)  requests  for  informa- 
tion. The  nature  of  these  requests  can  be  inferred 
from  the  experience  of  existing  specialized  data 
centers.  It  appears  that  the  frequency  of  such 
requests  ranges  from  perhaps  a  few  dozen  to  over 
a  thousand  per  year,  depending  on  the  scope  of 
the  center.     Since  OSRD  is  broader  than  any  of 
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the  centers  examined,  its  work  load  should  at  least 
equal  the  higher  of  these  figures;  that  is  to  say, 
we  should  expect  to  start  with  several  requests  per 
day  as  soon  as  the  availability  of  OSRD  has  be- 
come known,  and  to  grow  far  beyond  that  num- 
ber as  scientists  and  engineers  learn  to  use  the 
service.  It  has  been  the  experience  of  other  centers 
that  a  major  part  of  these  requests  is  not  for  data 
but  for  "administrative"  information— avail- 
ability of  publications,  addresses  of  organizations, 
etc. — and  technical  problems  other  than  data 
themselves.  A  large  portion  of  the  questions  come 
from  the  immediate  neigborhood  of  the  center — 
people  in  other  parts  of  the  same  organization — 
which  suggests  that  there  is  a  need  for  such  in- 
formation in  technical  laboratories  but  that  the 
present  cumbersome  methods  of  retrieval  discour- 
age most  potential  users.  This  observation  rein- 
forces our  argument  for  remote  access  to  the 
mechanized  data  collection  (cf.  sec.  2.1.5). 

Among  requests  for  data  we  distinguish  mainly 
two  kinds.  Those  of  the  first  kind  ask  for  a 
specified  property  or  group  of  properties,  of  a 
specified  material  or  group  of  materials,  for  speci- 
fied conditions  or  ranges  of  conditions  "F^**  in- 
stance, one  might  ask  for  the  optical  density  of 
water  at  20  °C,  at  a  given  wavelength,  or  one 
might  demand  a  table  of  the  specific  heats,  en- 
tropies, and  enthalpies  of  the  noble  gases  between 
0  and  100  °C.  Questions  of  the  second  kind  ask 
for  materials  for  which  certain  properties  have 
specified  Values,  or  lie  within  specified  ranges ;  for 
instance,  alcohols  with  molecular  weights  not  over 
102,  whose  boiling  points  at  atmospheric  pressure 
lie  between  60  and  100  °C.  In  this  second  class 
of  questions  are  also  the  problems  of  identifying 
materials  from  their  spectra,  from  crystallo- 
graphic  or  other  properties. 

These  two  types  of  questions  are  analogous  to 
the  direct  and  inverse  use  of  a  mathematical 
table — to  find  values  of  the  tabulated  function,  or 
to  find  those  arguments  for  which  the  function 
has  given  values.  In  the  case  of  properties  of 
materials,  certain  other  kinds  of  questions  are  also 
possible  (e.g.,  which  of  the  spectral  lines  of  mer- 
cury is  narrowest?)  but  they  are  comparatively 
rare.  Questions  of  the  first  kind  will  probably 
predominate,  if  the  experience  of  existing  data 
centers  may  be  taken  as  a  guide. 

The  ease  with  which  a  question  of  either  type 
can  be  answered  depends  crucially  on  the  size  of 
the  tables  and  on  the  way  in  which  they  are  or- 
ganized. The  latter  will  be  discussed  in  a  subse- 
quent section  of  this  report.  As  to  size,  there  is 
first  of  all  an  almost  unlimited  number  of  chemi- 
cal compounds;  those  on  which  significant  data 
are  available  may  number  100,000,  and  this  num- 
ber is  growing  rapidly.  There  are  perhaps  1000 
different  properties  within  the  scope  of  NSRDS. 
Some  of  these  are  represented  by  single  numbers, 
others  are  functions  of  one  of  more  variables ;  each 
of  the  latter  is  represented  by  perhaps  several 


hundred  numbers  (cf.  sec.  2.4.3.  below).  The 
are  few  materials  for  which  all  this  informati 
exists,  especially  if  the  less  dependable  measui 
ments  are  omitted.  We  estimate  vaguely  that  t 
average  number  of  reliably  measured  data  i 
each  of  the  100,000  materials  is  at  present  w; 
below  1000,  and  will  reach  and  pass  that  numt 
some  years  from  now.  A  collection  of  all  th( 
data  will  then  amount  to  100  million  words.2 

We  can  obtain  an  independent  check  on  t] 
figure,  crude  as  it  is,  in  the  following  way.  T 
present  rudimentary  library  of  OSRD  has  abc 
600  volumes,  of  which  about  400  are  data.  At 
average  of  400  pages  per  volume  and  500  words  \  \ 
page,  this  is  80  million  words.  Most  of  thij 
volumes  are  not  entirely  filled  with  data  but  cc 
tain  large  sections  of  text;  if  tabular  they  ofl 
include  large  blank  spaces ;  and  there  is  much  ov 
lap  in  the  contents  of  different  volumes.  1! 
number  of  separate  data  items  in  this  collecti! 
might  be  between  10  and  20  million. 

2.2.2.  File  Updating 

The  maintenance  of  files  is  a  standard  compu 
problem,  common  to  numerous  applications, 
has  been  the  subject  of  a  substantial  techni 
literature.     In  many  respects  the  requirements 
NSRDS  are  not  different  from  those  of  other  ; 
plications,    and    can    be    handled    by    standi 
methods.     File  updating — the  insertion  of  new 
tries  into  the  master  file  and  the  replacement 
any  old  ones  which  need  correction — is  norma 
done  during  the  same  computer  runs  which 
made  for  the  purpose  of  information  retries 
If  any  new  information  for  insertion  in,  or  corr 
tion  of,  the  file  is  received  by  the  laboratory  at  a 
time  between  two  such  computer  runs,  it  is  recorc 
on  tape  either  at  once  or  at  any  time  before 
next  computer  run.     Immediately  before  this  r 
all    the    accumulated    new    information    is 
arranged  in  the  order  in  which  it  is  to  be  ente:] 
into  the  master  file.     The  main  computer  run  tl 
consists  in  reading  the  master  file,  one  entry  a 
time.     After  reading  each  entry  we  first  exam 
the  next  item  on  the  "new  information"  list  (taj 
to  see  whether  it  is  to  be  inserted  before  this  en 
or  modifies  it.     If  neither,  we  put  the  master 
entry  through  the  information  retrieval  rout 
and  proceed  to  the  next  master  entry.     If,  h< 
ever,  the  next  "new  information"  item  does  < 
for  insertion  or  correction,  this  is  carried  out, 
new  or  corrected  item  is  put  through  the  ini 
mation  retrieval  routine,  then  the  following  "i 
information"  item  is  examined  in  the  same  \ 
etc. 

We  expect  to  use  the  same  procedure  in  updat 
the  file  of  standard  reference  data,  except  to 
one  of  the  new  erasable  mass  storage  media 


2  Conventionally   an    average   word   is   thought   of   as   5     ! 
characters.      Many    contemporary    computers    use    words    < 
characters   (36  bits).     Most  data  items  are  only  3  or  4  de<? 
digits    (10-13  bits)    but  because  of  redundant   notation  oc 
space  equivalent  to  about  one  word. 
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1  ilace  of  tape.     Thus,  the  only  additional  effort  re- 

uired  for  updating  is  that  of  actually  examining 

i ' ke  new  information  and  making  the  changes  in 

line  master  file;  since  normally  only  a  small  num- 

ier  of  changes  occur,  this  effort  is  small  compared 

7  0  that  of  reading  the  entire  master  file.    The 

-j  itter  operation  need  not  be  carried  out  beyond 

J  le  extent   necessary   for  information  retrieval. 

tt  the  same  time,  information  retrieval  is  always 

tased  on  completely  up  to  date  information,  since 

1 1|11  corrections  are  made  before  an  old  item  is  used 

-'""(or  retrieval. 

At     We  have  no  estimate  of  the  rate  at  which  new 

l  formation  will  flow  into  the  system,  but  it  is  ob- 

-I  ious  that  if  retrieval  and  updating  runs  are  made 

°}jaily  or  even  weekly,  only  a  small  fraction  of  the 

wile  items  will  be  affected  by  updating  in  an  average 

iotJW 

2.2.3.  Aids  to  Publication 

•'Aids  to  and  from  publication"  would  be  an 
qually  appropriate  heading ;  sometimes  the  exist- 
j'jtice  of  machine-readable  information  files  is  a 
elp  in  editing  such  material  for  publication;  at 
ther  times,  the  creation  of  such  files  is  facilitated 
y  steps  taken  primarily  for  the  purpose  of 
aiblication. 

The  use  of  machine- readable  material  in  the 
■ublication  process  can  be  advantageous  in  several 
rays.  The  most  obvious  case  is  that  in  which  the 
nformation  is  numerical  and  has  been  produced 
>n  a  computer,  so  that  the  entire  costly  process  of 
aanual  typesetting  is  avoided.  For  years,  tables 
•f  such  numbers  have  been  produced  on  typewrit- 
-  rs  or  line  printers  controlled  by  punched  cards  or 
aagnetic  tape;  these  are  of  limited  flexibility,  the 
esulting  printed  copy  suffers  from  poor  readabil- 
'  by,  and  column  headings,  pagination,  etc.,  present 
mioying  problems.  Since  the  introduction  of 
'ape-driven  photocomposition  devices  has  made  it 
possible  to  produce  printed  output  of  letterpress 
.uality  directly  by  computer,  at  costs  comparable 
o  those  of  manual  typesetting,  there  is  no  reason 
'  Vhy  computer-produced  material  should  ever  have 
o  be  hand-set  for  printing. 

Another   reason   for  computer-controlled  type 
omposition  is  the  facility   for   rearranging  or 
kherwise  revising  the  material.     This  is  impor- 
JT  ant  whenever  the  same  material  must  be  printed 
:  n  several  arrangements.     An  example  is  the  vol- 
• '  ime  "Crystal  Data  Tables"  now  being  prepared 
or  publication  by  photocomposition.     The  orig- 
inal information  is  being  keypunched  in  essen- 
-     ially  the  same  format  in  which  it  is  to  appear  in 
orint,  and  this  part  of  the  operation  offers  no  great 
iMvantage  over  manual  typesetting.     But  from 
'  he  keypunched  information  it  will  be  possible  to 
produce  alphabetical  indexes  for  authors,  names 
md  formulas  of  chemical  compounds,  all  by  auto- 
matic sorting,  checking,  and  editing.     In  subse- 
quent editions,  only  the  new  or  revised  material 
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ofjtfill  have  to  be  newly  keypunched,  the  computer 
will  insert  it  in  the  proper  place,  change  pagina- 


tion as  needed,  update  the  indexes.  In  addition 
the  computer  can  perform  a  large  number  of 
checks  based  on  the  characteristics  of  the  informa- 
tion :  the  crystallographic  data  satisfy  certain  in- 
equalities, abbreviations  for  journals  occurring  in 
literature  reference  must  be  taken  from  standard 
lists,  the  order  of  items  can  be  checked  etc.  This 
saves  a  large  part  of  the  proofreading  effort. 
Against  these  savings  must  be  reckoned  the  effort 
of  writing  special  computer  programs. 

The  publication  of  "Crystal  Data  Tables"  is  an 
example  of  a  situation  in  which  the  desire  to  use 
automatic  type  composition  furnishes  the  incentive 
for  recording  the  information  in  machine-readable 
form,  and  thus  aids  in  the  creation  of  mechanized 
files.  Other  instances  of  this  kind  will  probably 
be  bibliographies  and  acquisition  lists  which  are 
to  be  published  in  cumulative,  updated  form  at 
frequent  intervals.  There  will  be  other  cases  in 
which  automatic  type  composition  will  become  at- 
tractive only  after  the  information  has  been  re- 
corded on  tape  for  a  different  purpose. 

2.2.4.  Other  Computer  Functions 

Even  before  a  mechanized  information  file  for 
retrieval  and  updating  has  been  created,  computer 
methods  can  be  used  for  a  number  of  housekeeping 
functions.  It  has  been  the  experience  of  other 
information  centers  that  some  of  these  functions 
are  more  easily  mechanized  than  the  information 
storage  and  retrieval  itself. 

For  example,  even  while  OSRD  still  uses  a 
manual  information  retrieval  system  based  on  con- 
ventional library  practices,  a  computer  could  con- 
ceivably be  used  to  keep  track  of  purchase  orders, 
accessions,  shelving,  classification,  and  circulation 
(loans  of  books  to  users) .  It  is  possible,  however, 
that  mechanization  at  this  stage,  while  practiced 
by  some  other  installations,  may  not  be  economical 
for  OSRD  because  of  the  small  size  of  its  library. 

It  may  be  desirable  to  keep  statistical  informa- 
tion on  the  requests  for  information  which  are 
acted  on  by  OSRD.  Such  information  will  be  a 
promising  candidate  for  automation  after  a  brief 
initial  period  of  manual  handling,  during  which 
the  staff  becomes  familiar  with  the  number  and 
types  of  questions  to  be  expected. 

Similarly,  machinable  records  may  be  helpful 
in  the  indexing  of  the  information  collection  ac- 
cording to  properties,  materials  and  certain  classes 
of  materials,  and  some  other  characteristics  (e.g., 
by  checking  manually  introduced  index  terms 
against  computer-stored  master  lists  of  such  terms 
to  insure  uniform  nomenclature) . 

It  may  be  desirable  to  maintain  records  of 
sources  of  information  other  than  OSRD's  own 
collection.  These  sources  are  primarily  of  two 
kinds :  on  the  one  hand,  knowledgeable  individuals 
and  organizations  for  referral,  and  on  the  other 
hand,  books,  papers,  and  unpublished  reports. 
The  former  are  probably  too  small  in  number  to 
warrant  use  of  machine  methods.     The  latter  are 


numerous,  and  should  be  the  subject  of  a  mecha- 
nized file  system  if  it  is  decided  that  OSRD  will 
make  use  of  the  scientific  literature  to  any  extent. 

While  it  is  debatable  which  of  these  housekeep- 
ing operations  should  be  mechanized  before  the 
main  operation  of  OSRD  is  switched  over  to  com- 
puter use,  all  of  them  are  certainly  likely  to  be 
among  the  functions  of  the  ultimate  computer 
system. 

Apart  from  housekeeping  operations  there  is 
an  entirely  different  computer  function  which 
promises  to  be  particularly  useful  in  the  Standard 
Reference  Data  Program,  namely  the  retrieval  of 
information  by  means  of  citation  indexing  and  the 
related  method  of  bibliographic  coupling.  A  cita- 
tion index  is  obtained  by  recording,  for  each  scien- 
tific paper  or  report  belonging  to  a  given  field  of 
knowledge,  all  the  papers  cited  in  it  (customarily 
these  are  shown  as  a  "list  of  references"  at  the  end 
of  the  citing  paper)  ;  and  then  sorting  this  record 
in  order  of  the  cited  papers.  Thus  we  obtain  for 
each  paper  a  list  of  places  where  it  has  been  cited. 
Suppose  now,  for  example,  that  a  scientist  wishes 
to  find  the  latest  value  for  the  atomic  weight  of 
some  element.  He  knows  that  this  was  measured 
some  years  ago,  and  that  it  may  have  been  revised 
since.  He  enters  the  index  with  the  latest  publica- 
tion on  this  subject  known  to  him — perhaps  5  or 
10  years  old —  in  the  hope  that  the  publication  of 
a  subsequent  revision  would  reference  the  previous 
result.  It  is  obvious  that  this  kind  of  problem  oc- 
curs frequently  in  operations  such  as  the  NSRDS 
data  centers.  Other  applications  of  a  citation 
index  are  in  bibliographic  coupling  (finding  pa- 
pers which  are  related  to  a  given  paper  in  the  sense 
of  citing  some  of  the  same  literature),  prepara- 
tion of  bibliographies,  current  awareness  pro- 
grams, finding  reviews  of  a  given  paper  or  correc- 
tions to  it,  etc.  These  examples  may  suffice  to  show 
that  a  citation  index  is  not  only  a  useful  tool  in 
many  scientific  undertakings  in  a  general  way,  but 
is  also  particularly  applicable  in  an  information 
system  such  as  NSRDS. 

2.3.  Input  to  the  File 

2.3.1.  Keypunching 

If,  as  we  have  said,  the  information  file  will 
contain  about  100  million  numbers,  then  the  prob- 
lem of  recording  all  these  numbers  initially  in 
machine-readable  form  is  considerable.  Let  us 
assume,  for  example,  that  the  information  is  to 
be  punched  into  cards.  Experienced  organiza- 
tions like  the  Bureau  of  the  Census,  where  key- 
punching is  done  on  a  large  scale,  estimate  the 
cost  at  10  to  20  cents  per  card.  Each  card  holds 
80  decimal  digits.  Our  data  may,  on  the  average, 
have  3  to  4  significant  digits  each,  but  since  they 
vary  in  magnitude  one  may  have  to  set  aside  5  to 
6  card  columns  for  each  number.  Some  card  col- 
umns are  needed  for  identification;  on  an  average, 
we  may  get  10  data  numbers  per  card,  or  a  total 
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of  about  10  million  cards,  at  a  keypunching  cost 
on  the  order  of  1  million  dollars.  This  is  for  the 
initial  effort;  it  would  be  followed  by  somewhat 
smaller  annual  outlays  for  updating. 

This  is  not  prohibitive  in  comparison  with  the 
size  and  cost  of  the  entire  NSRDS  operation,  but 
it  is  large  enough  to  warrant  serious  study.  For- 
tunately there  are  alternative  ways  of  original  re- 
cording for  at  least  part  of  the  collection. 

Parenthetically,  one  might  reflect  for  a  moment 
that  10  million  cards  fill  about  200  file  cabinets 
again  a  large  but  not  prohibitive  number.  There 
should  be  no  reason,  however,  for  storing  all  these 
cards  simultaneously  for  any  length  of  time ;  stor- 
age for  permanent  record  would  presumably  be  or 
tapes  or  other  magnetic  media.  For  instance 
about  100  to  200  reels  of  ordinary  magnetic  tap< 
would  suffice  to  hold  the  entire  collection. 


2.3.2.  Print  Reading 


i 


The  art  of  automatic  reading  of  printed  copi 
and  recording  it  in  computer-readable  form  ha: 
been  developed  to  a  point  where  it  is  possible  h 
most  cases,  and  economical  in  some.  The  princi 
pal  difficulties  now  are  not  in  the  machine  recogni 
tion  of  printed  characters  but  in  paper  handling 
turning  of  pages  in  books,  registration  (precis 
location  of  copy  relative  to  the  reading  device) 
dirt  and  other  imperfections  of  printed  copy,  spe 
cial  symbols  and  unusual  type  fonts,  etc.  Printee 
tables  of  numbers  are  relatively  simple  and  fre 
of  most  of  those  difficulties;  it  is  likely  that  dat; 
which  have  been  assembled  into  printed  volumes 
print  style  and  arrangement  remaining  complete! 
uniform  over  many  pages,  can  be  handled  by  auto 
matic  print  readers  at  a  fraction  of  the  cost  o 
manual  keypunching.  To  date  the  main  area  o 
practical  application  of  automatic  reading  ha 
been  to  bank  checks,  but  the  problem  has  bee  i 
extensively  studied,  e.g.,  in  connection  with  maj 
chine  translation  of  languages,  and  economica 
solutions  appear  to  be  imminent. 

2.3.3.  Data  Available  from  Data  Centers 

Most  of  the  data  to  be  incorporated  into  th 
NSRDS  come  from  technical  data  centers  or  othe 
organizations  engaged  in  compiling  data,  an' 
some  of  these  will  provide  them  in  machine 
readable  form  for  reasons  of  their  own.  Fc 
example,  Professor  R.  Pepinsky's  collection  c 
crystallographic  data  is  already  on  magnetic  tap* 
(This,  however,  is  a  collection  which  has  not  bee 
subjected  to  critical  evaluation,  and  cannot  be  cor 
sidered  as  standard  reference  data.)  Other  exisi 
ing  centers  use  cards  or  various  forms  of  punche 
tape,  and  still  others  operate  manually  at  preser 
but  will  mechanize  as  they  grow.  On  the  otht 
hand  it  seems  likely  that  a  majority  of  data  centei 
will  always  prefer  manual  operation. 

Even  then  they  may  have  occasion  to  recor 
certain  sets   of  data  in   machine-readable  fora 


'  As  we  have  indicated,  this  may  be  done  as  an  aid 
Jto  printing,  as  in  the  case  of  the  "Crystal  Data 
•Tables"  (cf.  sec.  2.2.3).  Or  the  data  may  be  the 
fconsequence  or  result  of  numerical  computations 
'performed   on   digital  computers,  or  the  result 

/  !of  measurement  using  instruments  which  are 
-quipped  with  automatic  recording  devices.  The 
•Latter  two  uses  often  occur  jointly  and  reinforce 
Bach  other;  one  of  the  arguments  for  automatic 
recording  of  measurements  may  be  the  fact  that  the 
results  have  to  be  subjected  to  certain  numerical 
transformations   before   being   used.     Spectrom- 

tfle  feters,  x-ray  diffractometers,  and  bubble  chambers 

!c  'are  examples  of  instruments  which  employ  auto- 

061  'fnatic  recording  on  a  large  scale. 

m  I 

2.3.4.  Equipment  and  Format 

Machine-readable  data  coming  from  so  many 
liferent  sources  will  appear  in  a  variety  of  forms. 
Different,  media  will"  be  used,  such  as  punched 
1  tbards,  punched  pa-per  tape  of  varying  widths,  mag- 
M  fiaetic  tapes,  and  possibly  others;  and  the  format 
it)|e  iused  with  each  medium  will  not  be  uniform.  It 
prim  will  be  one  of  the  functions  of  OSRD  to  coordinate 
*ogi  and  standardize  these  media  and  formats. 
jili  i  Conversion  from  one  medium  to  another  can  be 
pwd  ijiccomplished  automatically,  at  moderate  cost,  and 
eras  b  at  worst  a  minor  nuisance,  as  long  as  the  formats 
y.sp  used  correspond  to  each  other  in  a  simple  way. 
Jrint(  On  the  other  hand,  conversion  from  one  format  to 
.;  fr  3ft  different  one  may  be  easy  or  hard,  depending  on 
U  da  iwhether  the  source  format  contains  all  the  inf  or- 
hiiii  ^nation  needed,  and  whether  this  information  ap- 
pleto  I  pears  in  approximately  the  same  order  as  in  the 
v  nit  icarget  format.  Therefore,  in  order  to  minimize 
cost  i  ::he  difficulty  of  conversion  of  data  originating  in 
irei  ( different  data  centers,  OSRD  ought  to  establish 
Qg  h  pne  set  of  standard  target  formats,  and  suggest  to 
is  w  3:he  data  centers  that  they  use,  not  necessarily  these 
th  m  iout  at  least  some  formats  which  are  easily  con- 
•oraic  avertible  to  the  standard  ones.  For  convenience 
':he  standard  formats  would  be  expressed  in  terms 
bf  one  particular  medium,  for  example,  the  or- 
dinary 80-column  punched  card,  since  it  is  widely 

I  ased  as  input  to  computer  systems  and  for  storage 
■ ,    )f  information,  and  facilities  for  keypunching  are 

'widespread  and  not  expensive.     This  would  leave 

II  "lata  centers,  and  indeed  OSRD  itself,  free  to  use 
*  any  other  medium,  so  long  as  they  choose  formats 
"1  Which  translate  easily  into  the  standard  ones. 

i0n  J    Thus  OSRD  might  establish,  in  cooperation  with 
j  jphe  data  centers  most  concerned,  standard  formats 
or   recording    data    on    punched    cards.     There 
would,  of  course,  be  a  different  format  for  each 
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and  of  data ;  possibly  hundreds  of  formats  would 

laye  to  be  agreed  on.  The  established  way  by 
jj  which  computing  laboratories  record  and  exchange 
1 '  detailed  information  about  formats  is  the  "card 
c  'sheet,"  a  list  of  instructions  for  punching  each 

column  in  a  card.  Thus  this  part  of  the  job  of 
01  3SRD  may  be  described  concretely  as  devising  a 
^  ;ard  sheet  for  each  of  its  card  decks. 

The  cards  may  not  serve  directly  as  computer 

nput  nor  as  storage  medium.     In  the  present  cir- 


cumstances, cards  would  be  transcribed  to  mag- 
netic tape  which  would  serve  both  purposes 
better.  This  transcription  is  character-to- 
character,  and  is  therefore  cheap  and  reversible. 
The  same  card  sheets  which  describe  the  arrange- 
ment on  cards  can  be  used  to  document  the  con- 
tents of  the  tapes.  In  a  few  years,  a  medium 
other  than  tape  may  be  preferable ;  the  transcrip- 
tion, again  character-to-character,  would  be  no 
problem,  and  the  same  documentation  could  con- 
tinue to  be  used.  At  present  we  cannot  foresee 
a  medium  which  would  not  be  compatible  with  the 
punched  card  code  (although  in  the  more  distant 
future,  there  may  be  too  few  distinct  card  codes 
available).  The  converse  is  not  true;  many  tape 
codes,  e.g.,  cannot  be  transferred  to  cards  without 
some  added  structuring. 

It  is  entirely  possible  that  some  of  the  informa- 
tion may  never  be  physically  on  cards;  it  may, 
e.g.,  be  recorded  on  paper  tape  by  the  originating 
laboratory,  transcribed  there  to  magnetic  tape, 
transmitted  to  NBS  over  a  radio  information  link, 
and  recorded  again  on  magnetic  tape.  It  would 
nevertheless  be  convenient  to  think  of  the  arrange- 
ment of  the  information  as  if  it  were  on  cards — as 
in  many  cases  it  will  be. 

We  have  so  far  discussed  digital  (numerical  or 
alphabetical)  information.  A  few  words  are  in 
order  about  graphical  information.  The  digital 
representation  of  curves  described  in  section  2.4.3 
below  is  economical,  but  the  conversion  (consist- 
ing in  the  computation  of  a  number  of  coefficients) 
may  be  considered  too  difficult  by  some  of  the  data 
centers.  Microfilm,  mirofiche,  video  tape,  etc.,  can 
be  used  more  directly  for  storage,  and  facsimile 
transmission  of  such  information  is  possible.  At 
present  it  is  not  clear  how  such  information  would 
be  integrated  into  the  mainstream  of  digital  com- 
puter operation  envisaged  for  NSRDS.  This  sub- 
ject is  further  discussed  in  section  3.2.4  below. 

2.4.  Information  Storage 

2.4.1.  Form  of  Data 

The  simplest  kind  of  information  with  which 
the  system  deals  is  exemplified  by  the  statement 
that  the  atomic  weight  of  hydrogen  is  1.00797. 
This  is  expressed  by  three  terms : 

Atomic  weight  —  Hydrogen  —  1.00797 

of  which  one  denotes  a  property,  another  a  mate- 
rial, and  the  last  a  value.  A  more  elaborate  item 
is  needed  to  convey  the  information  that  the  den- 
sity of  water  vapor  at  500  °K  and  a  pressure  of  10 
atm  is  0.0045967 : 


Density  —  Water 


500 


10  —  0.0045967. 


It  is  not  necessary  to  record  the  information  that 
the  numbers  500  and  10  represent  temperature  and 
pressure;  this  information  is  implicit  in  the  defi- 
nition of  "density,"  as  are  the  units  in  which  tem- 
perature, pressure  and  density  are  given.  That 
is  to  say,  the  computer  program  for  retrieving  in- 
formation on  density  must  contain  instructions  to 
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the  effect  that  following  the  designation  of  a  mate- 
rial there  will  be  recorded  a  series  of  triplets  of 
numbers,  namely  two  parameters  representing 
temperature  and"  pressure,  and  the  corresponding 
value  of  density.  (Actually,  the  information  will 
be  stored  in  more  compact  form,  to  be  discussed  in 
sec.  2.4.3.)  There  will  also  be  instructions  for 
adding  the  symbols  °K  and  g/cm3  after  the  appro- 
priate numbers  in  the  printed  output. 

Properties  and  materials  can  be  represented  in 
the  computer  by  numerical  codes.  For  example, 
the  AC'S  registry  number  might  be  used  for  mate- 
rials, while  properties  might  be  denoted  by  arbi- 
trary serial  numbers,  or  by  the  NSRDS  classifica- 
tion number  followed  by  one  or  two  digits  which 
specify  the  particular  property  within  its  class. 
Again,  the  computer  program  has  to  contain  in- 
structions to  replace  these  numbers  by  English 
words  in  the  output. 

In  addition  to  property,  material,  parameters, 
and  value,  an  item  may  contain  comments,  com- 
parable to  footnotes  in  a  printed  table.  These  may 
be  indications  of  source,  such  as  a  laboratory  name 
or  a  literature  reference,  or  explanations,  cautions, 
etc.  The  computer  program  provides  for  printing 
these  where  appropriate.  The  literature  refer- 
ences could  conceivably  be  used  by  themselves  for 
an  entirely  different  purpose,  namely  bibliographic 
searches,  but  it  is  doubtful  whether  OSKD  ought 
to  render  services  of  this  kind. 

Apart  from  numerical  data,  there  is  frequent 
need  for  data  in  graphical  form.  One  may  expect 
that  the  demand  for  graphical  data  will  be  some- 
what reduced  by  the  easy  availability  of  numerical 
information,  but  there  will  probably  be  a  residue 
of  curves  which  must  be  stored  and  retrieved.  It 
is  not  yet  clear  how  this  is  best  done.  One  could 
use  a  computer-driven  curve  plotter,  though  past 
experiments  with  such  a  system  for  spectra  have 
not  been  encouraging.  One  could  keep  a  manual 
file  of  graphs  (either  on  microfiche  or  as  hard 
copy)  and  use  the  computer  only  to  obtain  refer- 
ence numbers  to  this  file.  Hardware  exists  for 
automatic  retrieval  of  microfiche,  but  it  would  be 
a  foreign  body  incompatible  with  the  main  system. 
It  is  possible  that  in  the  next  few  years  an  auto- 
matic microfiche  retrieval  system  may  be  devel- 
oped which  can  be  connected  to,  and  steered  by,  the 
main  computer. 

2.4.2.  Arrangement  of  Data 

Since  practically  all  data  in  the  system  are  de- 
scribed as  properties  of  materials  there  are  two 
arrangements  which  suggest  themselves  naturally. 
We  could  group  data  by  properties,  starting  with 
one  property  and  listing  the  values  of  this  prop- 
erty for  all  materials,  then  proceeding  to  a  second 
property,  etc.  Or  we  could  use  materials  as  the 
major  subdivision  and  arrange  by  properties  under 
each  material. 

The  method  which  we  actually  propose  to  em- 
ploy is  a  combination  of  these  two  obvious  ones. 
We  suggest  that  the  set  of  all  properties  be  sub- 


divided into  homogeneous  groups  of  related  prop- 
erties, and  that  the  entire  data  file  be  arranged 
by  this  grouping.  Within  each  property  group 
there  would  be  a  listing  of  all  pertinent  materials, 
and  under  each  material  a  listing  of  parameter 
values,  each  followed  by  the  values  of  the  several 
properties  in  the  group.  This  may  be  illustrated 
bv  the  example  shown  on  the  following  page  ( from 
NBS  Monograph  20). 

By  saying  that  the  properties  in  a  group  are 
"related"  we  mean  merely  that  they  are  frequently 
used  together.  We  are  therefore  likely  to  save  time 
when  looking  for  the  answers  to  a  group  of  related 
questions. 

The  property  groups  are  "homogeneous"  in  the 
sense  that  the  properties  in  a  group  depend  on  the 
same  parameters,  and  are  meaningful  for  approxi 
mately  the  same  ranges  of  these  parameters.  Thk 
facilitates  the  storing  and  also  the  retrieval,  since 
the  same  set  of  computer  instructions  can  be  usee  j 
for  all  properties  in  the  group. 

Finally,  the  arrangement  by  property  groups  h 
similar  to  that  usually  found  in  print,  and  there ! 
fore  facilitates  the  original  recording  of  data.  Oi 
the  whole,  this  arrangement  is  a  natural  extension 
of  the  one  to  which  data  suppliers  and  users  have; 
been  accustomed.  It  hardly  needs  emphasizing 
that  we  retain  the  option  of  employing  a  grouping 
of  properties  which  differs  from  the  convent iona  '■ 
one,  whenever  this  is  advantageous  for  machini 
retrieval.  In  many  cases  we  expect  that  a  homoi 
geneous  group  will  contain  only  a  single  property  | 
so  that  we  will  be  arranging  "by  properties." 


2.4.3.  Compact  Storage 

It  is  most  important  to  store  information  in  th 
least  possible  space,  both  because  storage  capacit; 
in  the  computer  must  be  paid  for  and  because  th 
time  required  for  every  search  may  increase  witL 
the  number  of  words  stored.  We  shall  conside 
two  kinds  of  space  savings :  omission  of  identify 
ing  information,  and  condensation  of  the  func 
tional  values  themselves. 

The  possibility  to  omit  identifying  information 
depends  on  details  of  hardware  organization  whicl 
cannot  yet  be  foreseen.  If  every  storage  locatioi 
were  addressable,  the  identifying  informatior 
such  as  name  of  property  and  material  and  value 
of  parameters,  would  be  replaced  by  the  choice  o ; 
address.  For  a  simple  example,  suppose  tha 
values  of  a  function  of  temperature  are  to  be  store< 
at  intervals  of  10  °K  in  consecutive  addressabl 
storage  locations  beginning  with  address  12,285 
Computer  instructions  specifying  that  the  functio 
value  for  any  argument  T  is  stored  at  addres- 
12,288  +  0.1  t  are  sufficient  for  storage  and  re 
trieval,  and  no  value  of  T  need  be  stored.  Othe 
identifiers  can  be  handled  similarly,  or  one  ca 
store  one  identifier  value  preceding  the  entii 
group  of  function  values  to  which  it  pertains ;  fq 
instance,  record  one  value  of  pressure  precedin 
a  group  of  numbers  representing  density  at  diffei 
ent  temperatures. 
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ample  of  Arrangement  of  Data  by  Property 
Groups 

[From  NBS  Monograph  20] 
•-"tT Proper ty  Group:  Ideal  Gas  Thermodynamic  Functions 
C;/R       (H°-El)IRT     -(F°-E°0)/RT      S°/R 
Material:  H2 — normal  mixture 


2.  71388 
2.  78512 


3.  81909 
3.  72183 


8.  45365 

8.  41288 


12.  27274 
12.  53471 


Material:  HD 


^''ITote:  In  this  example,  four  ideal-gas  thermodynamic 
.::,  lotions,  namely  C°vjR  etc.,  have  been  selected  as   one 

•  -    mogeneous  group  of  related  properties"  and  used  as  a 

or  subdivision  of  the  data  file.     All  depend  on  the  same 

imeter,  temperature  (°K),  and  are  recorded  for  the 

,m£e  range  of  this  parameter.     This   major  subdivision 

►  USfdihe  me  is  further  subdivided  according  to    materials; 

er  each  material  there  is  a  listing  of  values  of  tempera- 

i,  each  followed  by  the  corresponding  values  of  the 

;■• [-  properties. 

a.  Oi  in  practice  it  is  unlikely  that  we  shall  work  with 

-  .■:o*'lressable,  stored  words.    In  tape  and  other  bulk 

tage  media,  addressing  is  usually  by  blocks  of 

•  ::4'a.    These  blocks  may  be  of  fixed  size  or  they 

be  variable  but  with  limits  on  size  imposed 

ie  economics  of  their  use.    The  cost  of  retriev- 

a  single  word  is  not  much  less  than  that  for  a 

ole  block  of  words.    A  simple  procedure  is  to 

re  all  identifiers  common  to  a  block  of  data  at 

beginning  of  the  block,  as  long  as  this  results 

blocks  of  manageable  size.    To  go  into  a  more 

ailed  design  at  this  time  would  be  premature. 

Economy  in  the  functional  values  leads  to  con- 

ts  which  have  long  been  studied  by  the  makers 

mathematical  tables :  interpolation  and  approxi- 

-  '•'•  tion.    For  simplicity,  consider  again  the  case 

one  function  of  one  variable,  say  density  as  a 
tion  of  temperature  (at  constant  pressure)  : 


I  IB 


iff  hit 
■•  atioil 


Density  of  steam,  at  1  atm 
[From  NBS  Circular  500,  p.  450] 


Temp. 
°K 
800 
810 
820 
830 
840 
850 


Density 
g/cm3 
000  274  64 
271  25 
267  93 
264  69 
261  54 
258  45 


Differences 
339  7 

332  8 

324  9 

315  6 

309 


estoK 

—Hi 


I  the  example  this  is  tabulated  at  10  deg  inter- 

s.    For    temperatures    falling   between    these 

ues,  the  density  can  be  obtained  by  linear  inter- 

.  j  ation  with  an  error  of  not  much  more  than  one 

;    t  in  the  last  place.    With  a  computer,  interpola- 

i  of  higher  order,  say  the  third,  is  quite  prac- 

»1.    For  this  it  would  suffice  to  tabulate  values 

'\J  much  larger  intervals,  say  every  50  deg.     Then 

density  value  at  an  intermediate  temperature 

K0 found  by  passing  a  cubic  polynomial  through 

|§r  points,  two  to  the  left  and  two  to  the  right 

the  desired  value.     To  save  computation  one 

js  not  store  the  density  values  at  all,  but  instead 


stores  the  coefficients  of  the  interpolating  poly- 
nomial. This  polynomial  will  reproduce  exactly 
the  desired  value  for  ^=50°,  100°,  etc.,  and  will 
give  a  sufficiently  close  approximation  at  other 
values  of  T.  Going  one  step  further,  we  note  that 
there  is  nothing  sacred  about  these  special  values 
of  T,  and  no  reason  to  insist  on  precise  agreement 
at  just  these  points.  So,  instead  of  interpolating 
between  these  we  use  a  polynomial  which  best  ap- 
proximates the  density  function  throughout  the 
entire  interval.  This,  in  turn,  enables  us  to  make 
the  interval  still  longer  without  getting  intolerably 
large  deviations.  In  summary,  then,  we  first 
choose  a  class  of  approximating  functions — say, 
cubic  polynomials — and  a  tolerance  limit — say, 
two  units  in  the  last  decimal  place  of  the  table 
(2X10-8).  We  then  find  the  longest  interval,  be- 
ginning at  T=0,  for  which  a  cubic  can  be  found 
which  approaches  the  given  density  function 
within  2  X 10"8 ;  and  we  store  the  end  point  7\  of 
this  interval,  together  with  the  coefficients  of  the 
polynomial  of  best  fit.  (The  technique  which  ac- 
complishes this  is  the  method  of  Chebyshev  poly- 
nomials.) Then  we  find  similarly  a  longest  inter- 
val starting  at  Tu  etc. 

It  remains  to  discuss  the  choice  of  the  class  of 
approximating  functions,  which  for  our  example 
has  been  cubic  polynomials.  If  we  increase  the 
degree  of  polynomials  used,  the  interval  which 
each  will  cover  increases,  so  that  we  need  fewer 
intervals  but  more  coefficients  for  each,  and  more 
computation  to  evaluate  the  function.  The  opti- 
mum compromise  between  these  conflicting  fac- 
tors will  differ  for  different  functions,  but  will  in 
any  case  be  for  a  polynomial  of  higher  order  than 
that  used  in  manual  computation.  Functions 
other  than  polynomials  may  be  considered. 
Polynomials  have  the  double  advantage  that  they 
are  easy  to  evaluate  and  easy  to  fit  (i.e.,  the  coeffi- 
cients of  the  optimal  polynomial  are  easily  found) . 
Since  computers  are  made  to  carry  out  the  arith- 
metic operations  of  addition,  subtraction,  multi- 
plication, and  division,  the  only  functions  which 
are  as  easy  to  evaluate  as  polynomials  are  rational 
functions,  and  they  are  hard  to  fit.  They  may  be 
used  in  special  cases  where  singularities  or 
asymptotes  are  present.  Many  kinds  of  orthog- 
onal series,  e.g.,  Fourier  series,  are  just  as  easy  to 
fit  as  polynomials  but  are  harder  to  evaluate. 
One  may  put  up  with  this  drawback  if  the  nature 
of  the  problem  seems  to  call  for  it.  For  example, 
there  is  some  recent  work  on  representing  spectra 
by  sums  of  Gaussians,  each  of  them  with  three 
parameters  representing  the  mean  frequency, 
width  and  intensity  of  one  line.  These  are  not 
quite  orthogonal  but  almost  so. 

The  stored  coefficients,  of  course,  must  be  de- 
rived separately  for  each  table.  This  effort  can 
largely  be  mechanized,  but  even  so  it  is  of  con- 
siderable magnitude. 

In  some  cases  it  may  be  possible  and  desirable 
to  store  numerical  indicators  of  the  accuracy  and/ 
or  precision  of  the  tabulated  data.  Judgments 
about  the  reliability  of  data  are  among  the  princi- 
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pal  concerns  of  NSRDS,  and  should  be  recorded  as 
far  as  possible.  For  the  most  part  such  recording 
will  initially  not  require  great  sophistication,  nor 
will  it  add  appreciably  to  the  requirements  for 
storage  space.  For  the  foreseeable  future  the 
large  majority  of  accuracy  estimates  will  be  quali- 
tative and  will  find  their  expression  in  the  selec- 
tion of  the  tabular  values  from  among  several 
competing  measured  values.  Some  will  be  numer- 
ical (such  error  estimates  are  now  being  tenta- 
tively assigned  e.g.  to  tabular  values  on  heats  of 
formation.)  For  the  tables  occupying  large  por- 
tions of  memory  space,  e.g.  properties  tabulated  as 
functions  of  temperature  and  pressure,  it  will  fre- 
quently be  sufficient  to  record  one  single  number 
representing  the  accuracy  of  the  entire  table. 
The  ultimate  goal  of  recording  an  error  estimate 
alongside  each  tabulated  value  is  a  long  way  off. 

2.4.4.  Storage  of  Instructions 

There  is  an  intimate  connection  between  the  data 
to  be  stored — in  the  case  of  the  preceding  section, 
the  coefficients  of  approximating  functions — and 
the  computer  instructions  needed  to  calculate  these 
functions.  These  instructions  are  an  integral  part 
of  the  stored  information.  Inasmuch  as  they  are 
different  for  each  table  (or  at  least,  there  will  be 
many  different  sets  of  such  instructions,  although 
some  may  apply  to  more  than  one  table)  they 
might  as  well  be  stored  with  the  data.  In  order 
to  minimize  this  storage,  one  will  attempt  to  devise 
a  general  -retrieval  program  (e.g.,  "polynomial 
approximation"),  or  perhaps  several  such  pro- 
grams, applicable  to  different  classes  of  tables. 
From  such  a  general  program,  the  specific  pro- 
gram needed  for  a  particular  table  is  derived  by 
specifying  a  few  numbers,  like  degree  of  poly- 
nomial, number  of  variables,  etc. ;  only  these  need 
to  be  stored  with  each  table. 

The  point  to  note  is  that  there  is  often  a  tradeoff 
between  data  and  instructions,  and  again  between 
special  instructions  applying  to  only  a  small  seg- 
ment of  data,  and  more  general  ones.  Special  in- 
structions should  be  stored  with  the  data,  so  that 
no  separate  lookup  is  needed ;  general  ones  should 
be  in  internal  computer  memory,  where  they  are 
always  accessible.  Apart  from  the  limited  size  of 
this  memory,  the  major  limitation  on  general  pur- 
pose instructions  is  the  effort  of  creating  them. 
They  are  so  important,  however,  that  this  pro- 
gramming effort  deserves  major  support. 

2.4.5.  Storage  Devices 

As  stated  before  (sees.  2.1.2,  2.2.1)  the  amount 
of  information  to  be  stored  in  our  system  may  be 
vaguely  estimated  at  100  million  words.  Internal 
computer  memories  store  usually  65,000  to  131,000 
words.  This  may  increase  in  the  next  few  years, 
but  not  to  anything  like  the  volume  we  require. 
We  shall  therefore  have  to  rely  on  external 
memory  components. 


Conventional  external  memories  are  magne 
tapes  and  drums,  and  more  recently  disk  files, 
addition,   several  new   devices  have  just  becoij 
available,  and  several  others  are  under  devek 
ment  and  may  be  expected  to  be  available  when 
need  them. 

Magnetic  tapes  have  practically  unlimited  <i 
pacity  and  are  inexpensive.     Perhaps  100  ret 
more    or   less,    depending    on   length   of   bloc  | 
would  hold  all  our  information,  at  a  cost  of  a  f 
thousand  dollars.     However,  most  computers  haj 
only  a  small  number  of  tape  reading  stations,  a:  I 
these  have  to  be  shared  with  other  users  of  tj 
same  computer.     Tapes  have  to  be  mounted  a| 
changed  manually.     The  time  to  find  a  particuT 
item  on  tape  (random  access  time)  is  on  the  ore 
of  minutes,  and  the  rate  of  reading  successive  i 
formation  is  too  slow  for  our  needs.     Therefc 
tape  must  be  ruled  out,  except  perhaps  for  an  ii 
tial  period  of  transition  and  for  certain  auxilia 
purposes. 

Magnetic  drums,  until  recently  limited  in  < 
pacity,  do  now  have  the  capacity  required  for  c 
applications.  Disks  and  other  existing  or  futi 
storage  devices  likewise  possess  the  necessa 
capacity. 

Apart  from  capacity,  the  transfer  rate,  i.e.,  i, 
number  of  words  which  can  be  transferred  frc 
consecutive  storage  locations  into  the  main  frai 
of  the  computer  per  unit  time,  is  critical  for  q 
application,  since  for  some  of  the  simpler  pre 
lems  it  will  be  necessary  to  scan  a  section  of  si 
cessive  storage  entries  and  perform  only  a  f< 
simple  computer  operations  on  each  item,  e 
comparison  with  a  search  request.  For  this  pi 
pose,  in  order  to  avoid  delays,  the  transfer  th 
must  not  exceed  the  time  required  for,  say,  t 
elementary  computer  operations,  about  10  to 
microseconds  with  today's  computers,  correspon 
ing  to  a  transfer  rate  of  2  to  4  million  bits  p 
second  (cf.  sec.  2.1.4).  For  many  other  purpos* 
the  computation  to  be  performed  with  each  item 
information  will  be  more  complex,  and  therefc 
the  transfer  rate  less  critical. 

The  random  access  time  is  not  critical ;  anythh 
below,  say,  one  second  is  certainly  acceptable. 

Magnetic  drums  and  disk  files,  which  have  bei 
in  existence  for  several  years  and  are  well  teste 
will  easily  meet  all  requirements.  They  are,  hoi 
ever,  somewhat  expensive.  The  newer  mass  sto 
age  media,  more  reasonable  in  price,  fall  somewh 
short  in  transfer  rates.  These  are  quite  new  ai! 
likely  to  be  improved,  and  several  companies  a 
working  on  the  development  of  large  external  sto 
age  devices,  so  that  it  is  likely  that  something  sui 
able  will  be  available  at  a  reasonable  price  by  i\ 
time  it  is  needed  by  NSRDS. 

Whatever  device  is  used,  the  file  of  standard  re 
erence  data  will  be  kept  permanently  in  storage  < 
it,  and  will  be  periodically  updated.  It  will  ther 
fore  be  necessary  (cf.  sec.  2.5.3)  to  have  tr. 
memory  component  reserved  for  the  exclusive  u 
of  NSRDS. 
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2.5.  Access  to  the  Computer 

2.5.1.  Man-Machine  Interaction 


In  the  preceding  sections  we  have  frequently 
lade  passing  reference  to  the  ways  in  which  the 
pmputer  is  interrogated  or  instructed.  A  large 
lit  of  the  requests  for  information  received  by 
i  SRD,  as  well  as  of  the  new  data  to  be  incorpo- 
•  tted  into  the  files,  are  collected  and  run  on  the 
I  »mputer,  say,  once  a  day,  using  a  general  infor- 
i  ation    retrieval    and    file    updating    program. 

Iommunication  with  the  computer  will  be  in  the 
,  mventional  manner,  i.e.,  using  a  small  peripheral 
iinputer  or  "secretary  computer"  questions  and 
I  nv  data  are  manually  keypunched  into  cards  (or, 
'  4.  a  few  systems,  punched  paper  tape  or  a  loosely 
f  icked  magnetic  tape),  and  then  loaded  into  the 
ripheral  computer  and  there  converted  to  mag- 
'}  Wic  tape  of  a  format  suitable  for  the  main  com- 
™i  ■  iter,  and  possibly  combined  with  input  to  other 
•oblems ;  then  the  whole  batch  is  run  on  the  main 
',a  jmputer,  resulting  in  an  output  tape;  and  finally 
'  e  output  is  printed  on  the  peripheral  machine 
Tider  the  control  of  the  output  tape. 
[  A  similar  regime  will  govern  certain  special 
•oblems  which  have  their  own  special  programs 
:  'it  which  can  nevertheless  be  batched  with  each 
:  her  or  with  other  problems.     In  this  class  are 
'  e  housekeeping  problems  and  the  preparation  of 
:r()iterial  for  publication,  in  which  the  retrieval 
-"Ppi    information  is  followed  by  detailed  editing 
f -" ; 'ocedures. 
i(  There  is,  however,  another  class  of  special  prob- 
'&  ?  ns  which  cannot  be  handled  in  this  way.     These 
L;P[>e  the  requests  for  information  which  do  not  fit 
'■"to  the  general  information  retrieval  program 
■  ■■?' ! '  f.  sec.  2.2.1) .     They  are  of  great  importance  for 
o  e  system  because  through  them  we  learn  how  to 
v[«Uprove  the  general-purpose  program.     In  most 
bits  1 1  these  questions  it  will  be  necessary  to  "feel  one's 
■•irpof  Say,"  asking  a  tentative  question,  awaiting  the 
litem j.swer,    modifying    the    original    question,    etc. 
berefi  lis  is  anologous  to  the  process  of  human  infor- 
;  .ation  retrieval.    Librarians  have  made  serious 
myth  talies  of  this  process  and  report  that  a  large  frac- 
ble.    hi  of  all  requests  for  information  need  rephras- 
: '*  >g  at  least  once,  often  several  times. 
D  tesfi  >itFor  this  reason  it  is  deemed  essential  to  have 
.re. ho  (nvenient  facilities  for  man-machine  "conversa- 
isssSmi."    We  visualize  a  console  in  the  offices  of 
mewl  8RD,  with  facilities  for  input  by  keyboard  and 
new  a  juiched  cards  or  punched  paper  tape ;  typewriter 
•.yes  i  jttput ;  and  the  ability  to  connect  on-line  to  the 
ffldstikin  computer.     It  is  likely  that  similar  consoles 
lingst  pi  be  placed  in  other  locations  at  NBS,  where 
v  by  t  Jay  will  serve  a  variety  of  purposes.     The  main 
mputer  must  be  operated  under  a  system  which 
iiipis  »iows  interrupting  long  problems   in   order  to 
:u7e  jmit  short  requests  from  the  remote  stations;  the 
'the  -deration  is  thus  characterized  as  time  sharing 
jveHlder  remote  control.     Naturally  there  must  be 
oe  differing  to  avoid  tying  up  the  main  computer 
iile  input  from,  and  output  to,  the  remote  sta- 
•ns  is  slowly   processed.     A  small  amount  of 


memory  is  the  minimum  requirement  for  such  a 
buffer,  but  it  will  probably  be  more  efficient  to  use 
one  or  more  small  satellite  computers  which  are 
on-line  connected  to  the  main  computer  as  well  as 
to  the  remote  stations.  Possibly  OSRD,  because 
of  the  large  amount  of  data  which  it  handles, 
should  have  one  such  satellite  computer  reserved 
for  its  own  remote  use. 

2.5.2.  Long-Distance  Access 

Once  the  principle  of  time-shared  remote  con- 
trol of  the  computer  has  been  established,  it  is  only 
a  small  step  to  a  system  which  places  similar  re- 
mote consoles  in  locations  at  much  greater  distance 
from  the  computer,  say  across  the  country.  The 
hardware  techniques  for  doing  this  are  already  in 
existence;  at  the  time  of  this  writing,  stations  at 
the  National  Bureau  of  Standards  in  Washington 
communicate  with  computers  in  Cambridge,  Mass., 
Dartmouth,  N.H.,  and  Phoenix,  Ariz.  Ordinary 
telephone  lines  are  used  for  interconnection.  This 
is  done  not  merely  for  experimental  and  demon- 
stration purposes  but  for  effective  computation, 
albeit  on  a  small  scale.  The  problem  is  thus  not 
a  technical  but  an  economic  one. 

That  there  is  a  need  for  such  facilities  is  made 
plausible  by  the  observation,  made  by  many  exist- 
ing information  centers,  that  a  large  part  of  the 
inquiries  which  they  receive — usually  more  than 
one-half — comes  from  the  installation  in  which 
they  are  located.  One  may  well  suspect  that  con- 
venient access  to  information  is  an  important  fac- 
tor ;  that  there  is  an  equally  great  need  for  infor- 
mation in  the  many  outside  installations,  but  this 
need  does  not  express  itself  in  inquiries  because  of 
the  slowness  and  inconvenience  of  operating  over 
greater  distances.  Indeed,  to  satisfy  this  latent 
need  for  information  may  well  turn  out  to  be  the 
greatest  accomplishment  of  NSRDS. 

Now  there  are  in  principle  two  ways  in  which 
this  can  be  done.  One  can  enable  laboratories 
throughout  the  country  to  obtain  on-line  connec- 
tion, via  long-distance  telephone  lines,  to  the  cen- 
tral computer  and  information  store  at  OSRD; 
or  one  can  duplicate  this  store  in  numerous  geo- 
graphically dispersed  computing  facilities.  This 
saves  the  cost  of  a  long-distance  telephone  call  for 
each  inquiry,  but  involves  a  much  greater  invest- 
ment in  storage  equipment  and  the  considerable 
difficulty  of  keeping  all  these  copies  of  the  original 
store  (and  of  the  computer  programs  which  go 
with  it)  exactly  updated. 

There  is  no  point  in  drawing  up  a  precise  bal- 
ance sheet  of  costs  at  this  time,  since  the  informa- 
tion file  does  not  yet  exist,  will  take  several  years 
to  compile,  and  some  cost  items  may  change  rad- 
ically in  the  meantime.  Other  organizations  are 
faced  with  similar  problems,  in  particular  the  Na- 
tional Library  of  Medicine,  and  their  experience 
will  be  valuable  to  us.  We  venture  the  guess  that 
if  a  system  had  to  be  introduced  today,  the  estab- 
lishment of  a  moderate  number  of  "secondary  in- 
formation centers,"  each  a  copy  of  the  primary 
center  at  OSRD,  would  be  optimal;  but  that  as 
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time  goes  on,  the  optimum  will  shift  in  the  direc- 
tion of  greater  centralization.  In  any  case,  we 
envisage  the  establishment  of  a  nationwide  net- 
work, either  of  secondary  centers  or  of  telephone 
access  to  the  primary  center,  as  something  to  be 
done  only  after  the  primary  center  itself  has  been 
in  operation  for  awhile. 

2.5.3.  Administrative  Arrangements 

We  have  already  indicated  that  it  will  prob- 
ably be  desirable  for  OSRD  to  share  the  major 
computing  facility  of  the  National  Bureau  of 
Standards,  but  that  OSRD  will  have  to  procure 
for  its  own  exclusive  use  a  large  external  storage 
component  and  a  remote  console.  In  addition, 
OSRD  will  have  a  heavy  share  in  the  use  of  a 
satellite  computer  to  act  as  buffer  between  the  re- 
mote console  and  the  main  computer ;  it  may  even 
require  one  entire  satellite  computer  for  its  own 
exclusive  use.  This  computer  would  have  to  be 
located  in  the  computing  laboratory — transmis- 
sion at  the  high  pulse  rates  used  by  the  main  com- 
puter limits  the  distance — and  it  would  have  to  be 
operated  in  accordance  with  the  ground  rules  and 
operating  systems  of  the  laboratory;  probably  it 
would  be  operated  by  computing  laboratory 
personnel. 

The  cost  of  main  frame  time  will  depend  on  the 
workload.  For  the  example  given  in  section  2.1.4 
(something  less  than  2  hours  per  day  on  the  IBM 
7094)  and^at  today's  rates,  it  would  be  about 
$80,000  per  year.  Future  computers  will  do  the 
same  amount  of  work  at  far  lower  cost,  but  the 
workload  will  undoubtedly  go  up.  If  the  external 
store  is  to  be  acquired  by  rental,  the  annual  cost 
at  today's  rates  might  be  $60,000 ;  only  an  order-of  - 


magnitude  estimate  is  possible.  The  decision  b( 
tween  renting  and  purchasing  will  have  to  b 
made  just  prior  to  acquisition;  usually  the  advar 
tages  are  almost  evenly  balanced,  and  the  cost  di: 
ference  is  smaller  than  the  uncertainties  in  % 
present  estimates  of  cost  and  workload.  The  coi 
of  a  separate  satellite  computer,  if  one  is  requirei 
is  also  hard  to  foresee,  since  such  smaller  compu 
ers  come  in  wide  price  ranges ;  the  order  of  magn 
tude  might  be  $100,000  of  annual  rental.  The  co: 
of  the  remote  console  is  very  small  by  compariso: 

It  is  to  be  assumed  that  the  computer  prograrr 
for  information  retrieval,  editing,  display,  upda 
ing,  etc.,  will  not  remain  static  but  will  be  in  a  coi 
tinuing  state  of  development.     This  may  be  doi 
by  personnel  of  the  Computation  Laboratory,  < 
of  OSRD,  or  both,  but  in  any  case  the  services  ( 
several  full-time  programmers  will  be  require' 
A  much  larger  number  of  people — possibly  b; 
tween  25  and  50 — will  be  needed  to  prepare  inpi 
data  and  requests  for  information,  accept  outp 
data  and   send  them  to  their  destinations,  el 
These    will    undoubtedly    have    to    be    OSR 
personnel. 

There  will  be  a  somewhat  larger  initial  pr 
gramming  effort,  extending  over  the  first  few  yea 
and  costing  perhaps  several  hundred  thousand  dc 
lars — this  cost  depending  very  greatly  on  how  ar 
bitious  the  initial  general-purpose  program  is, 
how  much  is  left  for  later  improvement.  The  ii 
tial  keypunching  of  data,  at  a  few  cents  per  woij 
is  an  even  bigger  investment  (cf.  sec.  2.3.1). 

Nevertheless,  all  these  costs  are  not  large  in  coi 
parison  with  the  intellectual  organization  of  t 
information.  The  latter  will  represent  the  pri 
cipal  effort  of  NSRDS. 


3.  Proposed  Interim  System 


3.1.  Characteristics  of  the  Operation 

The  real-life  conditions  under  which  OSRD  will 
have  to  operate  in  the  next  few  years  are  very  dif- 
ferent from  the  ideal  situation  which  has  been 
postulated,  explicitly  or  tacitly,  for  the  mechanized 
information  system  described  in  the  foregoing. 
We  assumed  the  existence  of  a  network  of  techni- 
cal data  centers,  so  complete  that  every  physical 
property  of  materials  for  which  measured  data 
exist  falls  into  the  province  of  one  or  the  other 
of  those  centers.  We  assumed  that  each  center  has 
collected  the  existing  measured  values  for  all  prop- 
erties for  which  it  is  responsible,  has  critically 
evaluated  them  and  thus  arrived  at  a  collection  of 
slandard  reference  data  which  it  continues  to  up- 
date. Copies  of  all  these  standard  reference  col- 
lections form  the  data  file  of  OSRD,  and  are  used 
to  reply  to  the  numerous  requests  for  information 
reaching  that  organization  day  by  day  from  all 
pads  of  the  country. 

In  reality,  it  will  take  a  long  time  to  establish 
recognized  technical  centers  in  all  areas  of  physi- 
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cal  science.  For  the  next  few  years  there  will 
areas  not  covered  by  any  center,  other  areas 
which  some  work  is  done  by  organizations  not  s 
hering  to  NSRDS,  along  with  a  small  but  incre; 
ing  number  of  member  centers  operating  witl 
NSRDS.  Even  in  the  areas  for  which  cent 
exist,  it  will  take  time  to  set  up  criteria  for  t 
evaluation  of  data,  and  more  time  to  perform  t 
actual  compilation  and  evaluation.  Therefore, 
some  areas  OSRD  will  have  no  data  at  all,  in  ma 
other  areas  it  will  rely  on  such  compilations  as 
can  find  in  the  literature  or  obtain  through  ad  1 
correspondence;  these  will  be  either  unevaluai 
or  subjected  only  to  preliminary  informal  evah 
tion  by  OSRD  staff.  And  finally,  the  demand  : 
information  will  not  arise  at  once  in  full  stren^ 
but  will  build  up  gradually  as  OSRD  becor 
known,  as  scientists  and  engineers  get  into  i 
habit  of  using  its  services,  and  as  they  learn  hi 
to  adapt  their  working  methods  and  their  appro; 
to  new  problems  to  the  easy  availability  of  f 
information. 


||  In  brief,  this  period  will  be  characterized  by 

apid  change  in  the  quality  and  quantity  of  data 

In  j|nd  in  the  volume  of  demand  for  services.    At  the 

jame  time,  our  own  understanding  of  the  situation 

■ill  still  be  deficient,  and  will  become  more  ade- 

uate  only  with  the  passing  of  time,  through  our 

perience  with  the  operation  itself. 

3*2.  Inquiry  Services 

3.2.1.  Getting  Started 

The  circumstances  set  forth  at  the  end  of  the 
([receding  section  suggest  strongly  that  informa- 
tion services  to  scientists  and  engineers  should  be- 
<  ain  at  once,  without  delay,  not  only  because  the 
benefits  accruing  to  the  technical  community  from 
ihe  availability  of  such  services  should  not  be  post- 
fl  rioned  by  several  years  pending  the  creation  of 
y  lata  files  and  systems,  but  also  because  the  crea- 
reifl]  :jion  of  the  systems  will  itself  be  aided  by  the  ex- 
Mt]  perience  which  OSRD  will  acquire  in  the  process 
-■  I  |f  rendering  services. 

OS!  I  It  follows  that,  in  many  fields,  requests  for  data 

'ill  have  to  be  answered  before  standard  reference 

ii  p  iata  have  been  so  designated.    In  such  cases,  rather 

wya  ijaan  merely  indicating  to  the  inquirer  that  no 

ill  IRD  are  as  yet  available,  it  will  be  preferable  to 

bowl  ilive  him  whatever  information  can  be  found  in 

lbq  is,  Jie  literature  or  through  personal  inquiries.  v  A 

T:iei  ijtiitable  disclaimer  should  be  appended  to  such 

^rwotieplies,  cautioning  the  user  that  the  information 

.11.    i.as  not  been  evaluated  by  NSRDS. 

since  <  Within  OSRD,  the  handling  of  such  requests  is 

:.  of  t  f ■  function  of  the  Information  Services  Operation 

pt'ISO).    The  reaction  of  ISO  to  an  information 

iequest  may  take  any  of  the  four  forms  listed  in 

ection  1.3.3  above:  referral  to  an  expert,  litera- 

Ure  reference,  documentation,  or  data  informa- 

ion. 

One  may  wonder  whether  the  willingness  to  re- 

ily  in  terms  of  other  than  SRD  would  tend  to 

verburden   the    organization.      The    charter    of 

jrSRDS  does  not  commit  us  to  anything  beyond 

giving  information  on  SRD,  and  one  could  choose 

)  draw  the  line  there.     Actually,  however,  our 

orkload  for  the  proposed  broader  service  will  be 

o  greater  than  it  would  be  for  the  narrower  one 

standard  reference  data  had  already  been  desig- 

el  ated  in  all  fields.    Possibly  the  search  for  uneval- 

ated  information  is  more  laborious  than  the  re- 

ieval  from  an  organized  file,  but  on  the  other 

.r.-ai^|and  a  larger  fraction  of  the  inquiries  will  be  an- 

vered  by  mere  referral.    Thus  the  volume  of  work 

hich  ISO  is  taking  upon  itself  is  no  greater  than 

imandl  pat  to  which  it  will  eventually  have  to  get  accus- 

5_  >med  anyway.     Meanwhile,  the  broader  services 

ill  be  beneficial  to  ISO  itself  as  a  realistic  train- 

,    ig  ground,  to  the  customers  who  need  the  infor- 

i  nation,  and  most  of  all  to  the  entire  community  by 

astening   the   process   of   acquainting   scientists 

-^|tTith  NSRDS  and  getting  them  into  the  habit  of 

laking  use  of  it. 

J  As  stated  in  section  2.2.1  above,  the  experience 
f  other  data  centers  leads  us  to  expect  that  the 


J.W1 

areas 
inoti 


number  of  requests  for  information  will  start,  once 
the  existence  of  OSRD  has  become  generally 
known,  at  a  level  of  several  per  day,  and  will  grow 
from  there.  A  majority  of  the  requests  will  be 
"administrative"  and  can  be  handled  by  ISO  staff 
without  difficulty.  The  requests  for  technical  in- 
formation proper  will  be  screened  by  ISO  staff 
and  processed  by  one  of  the  procedures  outlined 
in  the  next  few  sections. 

3.2.2.  Referral 

In  the  early  years,  while  NSRDS  is  still  develop- 
ing, a  large  portion  of  technical  queries  received  is 
likely  to  be  referred  to  experts.  We  expect  that 
this  practice  will  gradually  decrease  but  will 
never  cease  entirely.  There  are  pitfalls  in  many 
seemingly  simple  technical  questions  which  can- 
not be  avoided  by  the  uninitiated.  Until  OSRD 
has  accumulated  some  experience  it  will  be 
well  to  have  all  technical  answers,  even  those  rou- 
tinely prepared  by  OSRD  from  its  own  files, 
checked  by  a  specialist.  Later  on  it  will  be 
possible  to  dispense  with  this  for  the  more  fre- 
quently occurring  types  of  questions.  Perhaps 
the  hardest  problem,  in  the  long  run,  and  one  for 
which  we  have  no  ready  answer,  will  be  for  the 
ISO  staff  to  recognize  when  a  problem  needs  a 
specialist. 

The  experts  to  whom  questions  are  referred  can 
be  taken  from  the  following  groups : 

(a)  Technical  area  managers  of  OSRD. 

(b)  Data  centers  which  adhere  to  NSRDS. 

(c)  Divisions  at  NBS  outside  OSRD. 

(d)  Other  scientists. 

In  general  this  will  be  the  order  of  preference 
in  calling  on  experts,  except  that  sometimes  (c) 
may  precede  (b).  Technical  area  managers  are 
so  few  in  number  that  it  will  often  be  practical  to 
bypass  them.  It  may  be  hoped  that  experts  in 
each  category,  if  they  are  .unable  to  handle  an 
inquiry,  will  at  least  suggest  other  more  suitable 
experts  in  the  higher  categories. 

The  system  used  in  referral  must  meet  the  fol- 
lowing conditions,  in  this  order  of  importance :  the 
inquirer  should  receive,  without  undue  delay  and 
without  further  effort  or  annoyance  to  him,  a 
reply  which  is  correct  and  helpful;  the  replying 
expert  should  receive  credit  and  should  not  be  un- 
duly burdened;  ISO  should  be  able  to  add  to  its 
storehouse  of  experience,  both  technically  in 
regard .  to  the  specific  question  and  administra- 
tively in  'regard  to  statistical  distribution  of 
questions  in  general.  Finally,  direct  back-and- 
forth  contact  between  inquirer  and  expert  needs 
to  be  facilitated  for  those  cases  where  the  formula- 
tion of  the  question  itself  presents  problems. 

It  is  believed  that  the  following  stepwise  pro- 
cedure meets  all  these  criteria. 

(a)  ISO  ascertains  what  information  it  can 
furnish  from  its  own  files. 

(b)  If  this  is  inadequate,  ISO  quickly  locates  a 
suitable  expert— -normally  by  a  series  of  phone 
calls — and  establishes  that  he  is  able  and  willing  to 
handle  the  inquiry. 
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(c)  ISO  forwards  the  inquiry  to  the  expert  by 
phone  or,  if  too  voluminous,  by  mail. 

(d)  Simultaneously  ISO  informs  the  inquirer 
of  this  action. 

(e)  The  expert  replies  to  ISO,  which  relays  the 
reply  to  the  inquirer  without  delay. 

(f )  ISO  follows  up  if  the  reply  is  not  forthcom- 
ing after  a  reasonable  delay. 

(g)  ISO  maintains  statistics  on  inquiries  re- 
ceived and  on  their  disposition. 

It  is  important  that  ISO  consider  this  pro- 
cedure not  only  as  a  way  of  satisfying  the  inquirer 
but  also  as  an  opportunity  for  its  own  staff  to 
deepen  their  understanding  of  the  technical  ques- 
tions asked,  so  that  ISO's  own  staff  will  gradually 
be  enabled  to  handle  an  increasing  portion  of 
inquiries. 

In  regard  to  (b),  it  is  likely  that  ISO  will 
rapidly — probably  in  a  matter  of  months — become 
acquainted  with  a  large  number  of  experts  both  at 
NBS  and  in  various  data  centers,  with  their  fields 
of  specialization  and  with  the  degree  of  their  will- 
ingness to  handle  inquiries.  Initially,  the  NBS 
Index  of  Technical  Activities,  the  NAS-NRC 
"Directory  of  Continuing  Numerical  Data  Proj- 
ects," or  inquiry  from  OSRD  technical  area  man- 
agers or  NBS  technical  division  chiefs  will  reveal 
the  names  of  suitable  candidates.  Ultimately,  an 
index  of  such  experts,  compiled  and  published  by 
ISO,  may  in  itself  become  a  useful  addition  to  the 
technical  literature. 

3.2.3.  Reference 

An  inquiry  should  be  answered  by  one  or  more 
references  to  the  literature  if  ISO  is  able  to  obtain 
these  references  without  undue  effort  and  if,  at  the 
same  time,  it  is  impractical  to  provide  copies  of  the 
pertinent  documents  or  portions  of  them.  The 
latter  would  be  the  case,  e.g.,  for  obscure  journals, 
unpublished  reports,  documents  which  are  in  the 
main  library  of  NBS  but  not  in  the  collection  of 
ISO;  also  for  inquiries  where  the  reply  involves 
a  large  number  of  pages,  too  difficult  to  reproduce ; 
and  finally  for  inquiries  answered  by  phone.  In 
all  other  cases,  namely  where  an  inquiry  can  be 
satisfactorily  answered  by  mailing  copies  of  a  few 
pages  from  a  document  easily  accessible  to  ISO, 
it  will  be  preferable  to  do  so  rather  than  merely  to 
give  the  inquirer  a  literature  reference. 

The  mam  limiting  factor  to  the  use  of  both 
reference  and  documentation  (furnishing  of 
copies)  will  be  the  difficulty  of  locating  the  refer- 
ences. For  this  purpose  ISO  would  either  have 
to  maintain  a  large,  well  indexed  and  updated  file 
of  literature  reference,  or  to  undertake  an  ad  hoc 
search  separately  for  each  inquiry.  Either  alter- 
native is  so  laborious  that  there  is  at  present  no 
justification  for  adopting  it,  in  view  of  the  likeli- 
hood that  ultimately,  some  years  from  now,  there 
will  be  a  mechanism  for  answering  almost  all 
questions  from  the  data  file  itself,  without  refer- 
ring to  the  literature. 

It  is  therefore  proposed  that  the  use  of  both 
referencing  and  documentation  in  answering  to 
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inquiries  be  limited  to  those  cases  where  the  needed 
references  are  obtainable  without  too  much  effort. 
Potentially,  the  job  of  obtaining  references, 
especially  those  to  recent  documents,  can  be  greatly  : 
facilitated  by  the  use  of  a  citation  index.  There- 
fore, the  development  of  citation  indexes,  and  of; 
the  related  subject  of  bibliographic  coupling, 
should  be  closely  watched.  Presently  available 
citation  indexes,  namely  those  of  M.  M.  Kessler  at 
the  MIT  library  and  of  the  Institute  of  Scientific 
Information  in  Philadelphia,  are  somewhat  defi- 
cient in  coverage  and  accessibility.  Nevertheless, 
there  are  reports  of  encouraging  trial  uses  of  these 
indexes.  Similar  experiments  are  contemplated 
by  OSRD.     (See  sec.  2.2.4.) 

3.2.4.  Documentation 

Under  this  heading  we  discuss,  as  mentioned  be- 
fore, the  furnishing  of  copies  of  documents  in  re- 
sponse to  data  inquiries.  For  the  next  few  years 
this  will  be  discouraged  by  the  difficulty  of  locat- 
ing references,  as  discussed  in  the  preceding  sec- 
tion. In  the  long  run  it  will  be  superseded  by  the 
greater  ease  of  obtaining  the  data  themselves  from 
the  data  file,  without  going  to  the  source  docu- 
ments. In  some  cases,  however,  it  will  remain  de- 
sirable to  furnish  hard  copies,  in  particular  where 
graphical  information  is  involved;  phase  dia- 
grams, contour  lines  of  electron  density  obtained 
from  diffraction  patterns,  shapes  of  spectral  lines. 
Even  though  graphical  information  will  become 
less  popular  as  numerical  data  becomes  more  read- 
ily accessible — at  present  some  graphs  serve  merely 
as  a  convenient  condensed  representation  of  num- 
bers— a  hard  core  of  demand  for  copies  of  graphs 
will  persist.  To  this  must  be  added  requests  for 
copies  of  entire  tables,  sometimes  several  pages  in 
length. 

At  present  the  most  economical  way  to  produce 
copies  is  the  Xerox  process.  This  presupposes  that 
a  hard  copy  is  on  file  in  ISO  (or  less  conveniently, 
in  the  NBS  library).  It  requires  no  further 
investment. 

Let  us  briefly  discuss  some  of  the  existing  meth- 
ods of  partial  mechanization  which  might  be  in- 
voked   to    relieve    any    developing    bottlenecks. ,' 
They  can  be  characterized  as  micro-optical  sys-J 
terns.     The  first  to  come  to  mind  is  microfilm. 
Its  principal  advantage  is  to  reduce  the  physical 
size  of  the  library.     It  also  tends  to  speed  up  the 
production  of  copies.     It  has,  for  the  application  1 
here  discussed,  the  overriding  drawback  that  in-1 
dividual  information  items  cannot  easily  be  cor-j 
rected,  inserted  or  deleted ;  the  only  way  to  do  this  j 
is  to  remake  an  entire  reel  of  film.    We  are  likely  ] 
to  be  faced  with  numerous  cases  where  a  single  J 
number  in  a  table,  or  a  single  graph  out  of  a  set  oi  I 
charts,  has  to  be  revised.     Since  a  space  shortage  1 
for  the  storing  of  documents  is  not  likely  to  be  I 
critical  for  some  time,  microfilm  at  present  does  J 
not  appear  as  a  promising  prospect. 

Microfiche,  microcards,  and  similar  systems  are| 
easily  updated.     The  saving  in  space  is  less  pro 
nounced  than  with  microfilm.     Handling  is  often 


irth 


re  difficult  than  with  either  film  or  hard  copy, 

:  there  are  systems  with  automatic  selection  of 

•ds,  where  handling  is  no  problem.    If  at  some 

ure  date  the  volume  of  library  holdings  and  of 

1  y  transactions  requiring  hard  copy  becomes  too 

ge  for  manual  operation,  such   a  system  of 

•  :rofiche  or  microcards  holds  promise. 

3efore  introducing  it  one  should  investigate,  es- 

■ially  in  view  of  the  cost  of  transcribing  an  en- 

3  document  collection  to  the  new  system,  how  it 

1  be  coordinated  with  the  digital  computer  oper- 

on  envisioned  for  the  future.    It  would  be  un- 

•tunate  to  be  saddled  with  a  semimanual  system 

•  copying  graphical  information  which  is  in- 

npatible  with  the  central  digital  computer  sys- 

q  for  the  handling  of  numerical  information. 

i  the  other  hand,  with  some  development  work 

night  be  possible  to  have  an  integrated  system, 

a  microfiche  selector  which  is  connected  to  the 

rital  computer,  receives  from  the  computer  the 

ial  numbers  of  items  to  be  copied,  automatically 

!  a   arches  for  these  items  and  copies  them,  together 

?tl  ii|th  identifying  information  supplied  by  the  com- 

fra    Iter  (e.g.,  serial  number  of  the  inquiry). 

On  the  other  hand,  it  may  turn  out  that  a  curve 

)tter  directed  by  the  computer  is  a  more  econom- 

1  way  of  reproducing  graphical  information. 

Lis   possibility   should   be   examined   before   a 

aim  iicro-optical  system  is  proposed  for  installation. 

line  ;,mpare,  in  this  connection,  the  discussion  of  in- 

t.  transmission  and  storage  in  sections  2.3.4  and 

m  1.3.  above.    To  date,  experiments  with  digitally 

:ere  produced  images  of  spectra  have  not  been  suc- 

.  -  >sf  ul,  but  the  door  is  not  closed. 
rap!  The  furnishing  of  hard  copy  may  be  required 
ts  f(  Jfc  only  in  reply  to  inquiries  but  also  as  a  service 
ges  |  data  centers,  in  cases  where  the  latter  do  not 
\re  the  facilities  for  obtaining  such  copies 
win  1-ectly. 

3.2.5.  Data  Service 

It  is  expected  that  ISO  will  be  able  from  the 

it  to  respond  to  a  number  of  inquiries  by  di- 

ptly  furnishing  the  desired  data,  from  its  own 

ta   file.     As   stated   above,   such    information 

;>uld  be  accompanied  by  a  suitable  disclaimer 

ting  that  the  data  have  not  been  evaluated  for 

*  |?  ^ability  and  therefore  do  not  constitute  stand  - 

1  reference  data.    As  technical  centers  are  estab- 

hed  or  integrated  into  NSRDS,  as  criteria  for 

iluation  are  set  up  and  the  evaluation  of  data 

J  undertaken,  ISO  must  incorporate  the  results 

'    jsuch  evaluation  into  its  files. 

■  jj  ]One  way  to  do  this  is  to  set  up  a  file  of  evalua- 

'■■'-,}  j'e  comments  which  is  arranged  in  the  same  order 

^t  ( the  data  file  itself  (cf.  sec.  3.4.3  below)  and  in 

set.  I'uch  every  comment  is  cross-referenced  to  the 

ofta  iipropriate   data   file   item.    When   an   inquiry 

i  <mes  in,  the  subject  is  first  looked  up  in  this  com- 

1  |nts  file.     If  a  comment  report  is  found  there, 

is   report,   together   with   the   literature   items 

| '  >ss-ref erenced  by  it  and  the  data  given  in  these 

J   ms,  furnish  the  material  for  the  reply.    If  no 

'mment  is  in  the  file,  the  data  are  looked  up  in 
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the  data  file  and  used  for  a  reply  with  a  disclaimer 
as  described  above. 

A  more  radical  approach  would  be  to  segregate 
the  entire  ISO  library  into  two  parts,  SRD  and 
other  data.  This  would  save  one  look-up  step  for 
SRD  but  would  require  frequent  rearranging  of 
the  collection,  possibly  affecting  different  parts  of 
the  same  publication  in  different  ways.  Also  this 
approach  would  probably  oversimplify  the  prob- 
lem: quite  possibly  the  result  of  data  evaluation 
will  not  be  a  simple  dichotomy  into  SRD  and  other 
data  but  a  more  detailed  qualitative  description  of 
the  worth  of  the  data. 

In  the  beginning,  as  we  said  above,  all  technical 
responses  issued  by  ISO  should  be  checked  by  tech- 
nical specialists.  Gradually,  ISO  will  learn  by 
experience  that  certain  routinely  recurring  types 
of  questions  do  not  require  such  checking.  At  first 
the  burden  of  checking  will  have  to  be  borne  by 
the  technical  area  managers  of  OSRD  and  by  cer- 
tain specialists  elsewhere  in  NBS.  If  this  load 
gets  too  heavy  because  the  volume  of  inquiries 
grows  faster  than  ISO's  ability  to  handle  them 
without  assistance,  one  of  two  courses  are  open: 
either  ISO  acquires  technically  competent  staff  of 
its  own,  or  the  flow  of  inquiries  is  restricted  by 
answering  only  in  terms  of  standard  reference 
data. 

3.3.  Publication  Services 

3.3.1.  Types  of  Services 

In  contrast  to  the  inquiry  services  discussed  in 
the  preceding  sections,  which  are  unscheduled  and 
are  undertaken  on  receipt  of  requests  for  them,  the 
editorial  and  publication  activities  are  scheduled 
by  OSRD  on  its  own  initiative.  We  may  distin- 
guish periodical  and  aperiodical  publications. 

The  principal  output  not  only  of  OSRD  but  of 
the  entire  NSRDS  will  be  monographs,  especially 
data  compilations  and  evaluations.  These  can  take 
different  forms.  There  is  first  of  all  the  "National 
Standard  Reference  Data  Series"  of  the  National 
Bureau  of  Standards,  published  by  the  Govern- 
ment Printing  Office,  of  which  several  numbers 
have  already  appeared.  The  series  will  contain 
tables  of  data  compiled  and  evaluated  under  the 
auspices  of  OSRD  and  related  material.  It  is  in- 
tended to  supplement,  rather  than  supplant,  the 
publication  activities  of  technical  data  centers  and 
other  interested  organizations.  Thus  the  NSRD 
Series  will  have  for  primary  subjects  those  com- 
pilations produced  at  NBS,  or  by  organizations 
which  for  some  reason  or  other  cannot  undertake 
publication,  or  those  for  which  there  is  no  appro- 
priate technical  data  center,  as  well  as  state  of  art 
reports,  lists  of  compilations  considered  to  be 
standard  reference  data,  reports  on  classification, 
indexing,  mechanization,  and  other  topics  of  inter- 
est to  data  compilers,  evaluators  and  users  in 
general. 

In  addition  to  producing  the  NSRD  'Series, 
OSRD  will  publish  data  (usually  taken  from 
monographs)   in  loose-leaf  form,  machine-reada- 
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ble  media  such  as  tapes  or  punched  cards,  or  other 
formats  which  prove  to  be  widely  useful. 

OSED  will  also  endeavor  to  encourage  and  as- 
sist data  centers  and  other  organizations  in  pub- 
lishing their  results,  especially  by  providing 
editorial  help,  advice  on  mechanization  and  occa- 
sionally financing,  especially  in  situations  where 
such  assistance  will  make  the  difference  between 
prompt  publication  and  long  delay. 

As  for  periodical  publication,  it  may  be  useful 
to  publish  a  news  or  current  awareness  service, 
concerned  with  events  in  the  field  of  data  on  prop- 
erties of  materials.  New  data  compilations  which 
have  been  published,  projects  undertaken  or  com- 
pleted, contracts  awarded,  new  mechanization 
techniques,  etc.,  would  be  listed.  Undoubtedly 
there  is  plenty  of  material  which  is  of  interest — 
currently  OSRD  writes  unpublished  reports  on  its 
own  activities  alone,  running  to  several  pages  per 
month — but  to  collect  this  material  from  the  many 
organizations  involved  would  require  a  consider- 
able effort  and  should  be  done  only  if  there  is  a 
clear  need  for  it.  Some  of  this  information  ap- 
pears in  subject-oriented  publications;  perhaps  in 
time  the  activity  of  data  compilation  and  critical 
evaluation  will  come  to  be  considered  as  a  field  of 
technical  specialization  in  its  own  right,  and  will 
increase  the  demand  for  a  periodical  publication 
service  of  this  kind. 

Another  possible  activity,  unquestionably  useful 
but  requiring  an  even  greater  editorial  effort, 
would  be  a-Current  bibliography  on  data  compila- 
tion, evaluation  and  perhaps  generation;  i.e.,  a 
periodical  listing  of  new  published  papers  and 
perhaps  unpublished  reports  on  these  subjects, 
giving  at  least  the  bibliographic  description  (au- 
thor, title,  place  of  publication)  and  perhaps  also 
abstract,  critical  review,  and/or  listing  of  refer- 
ences— the  latter  for  use  in  connection  with  a  cita- 
tion index  and  bibliographic  coupling. 

OSRD  publications,  especially  those  in  orint  or 
report  form,  will  be  distributed  by  GPO,  the 
Clearinghouse  for  Federal  Technical  Information, 
or  other  appropriate  agencies,  rather  than  by 
OSRD  itself. 

3.3.2.  Preparing  for  Mechanization 

There  are  several  steps  which  OSRD  has  taken 
or  plans  to  take  in  the  near  future  in  order  to 
assist  in  the  transition  to  mechanized  publication 
described  in  section  2.2.3. 

The  first  of  these  is  the  acquisition  of  a  linofilm 
keyboard,  which  produces  the  15-hole  punched 
paper  tape  needed  to  drive  the  linofilm  composi- 
tion machine.  One  of  the  advantages  of  having 
this  device  at  NBS  is  that  keypunching  can  now  be 
done  under  the  direct  supervision  of  the  scientists 
responsible  for  the  preparation  of  a  manuscript. 
This  makes  it  unnecessary  to  prepare  the  manu- 
script to  the  same  degree  of  perfection  as  if  it  were 
sent  to  a  printer ;  rather,  pencilled  corrections  and 
verbal  instructions  to  the  keyboard  operator  are 
acceptable,  and  many  questionable  cases  can  be 
settled  by  discussion  as  they  arise. 
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Another  step  in  the  same  direction  is  the  planne 
procurement  of  a  modified  tape  typewriter  whic  J 
will  accept  a  number  of  special  character  insert] 
(similar  to  the  commercially  available  "Typits" 
and  at  the  same  time  produce  a  punched  pape 
tape.     The  choice  of  special  characters,  of  differ! 
ent  type  fonts  and  other  features  is  far  moil 
limited  than  with  the  linofilm  keyboard,  but  tli| 
latter  is  more  difficult  to  handle  than  the  typcj 
writer    and    does    not    produce    an    immediatel 
available    typed    copy    for    quick    proofreading 
Therefore,  and  also  because  of  its  lower  cost,  th 
tape  typewriter  is  preferred  for  material  of  simp] 
typography. 

Both  recording  methods  can  be  aided  by  pei 
forming  some  editing  functions  on  a  compute 
Several  computer  codes  for  such  purposes  haT 
been  written;  so  far,  each  of  these  codes  wa 
tailored  to  one  particular  publication.  One  of  th 
efforts  in  which  OSRD  expects  to  engage  in  th 
near  future  is  the  production  of  more  general], 
applicable  computer  codes  for  publication  editing , 

An  IBM  Document  Writer  has  been  acquired  1) 
one  of  the  data  centers  located  at  NBS  and  j 
being  used  with  great  success  for  material  of  intei , 
mediate  typographical  quality. 

3.4.  Preparatory  Activities 

3.4.1.  Information  Gathering 

As  stated  before,  the  initial  period  of  operatio 
finds  OSRD  with  inadequate  data  on  both  tl 
need  for  its  services  and  the  tools  available  f( 
rendering  them.  One  of  the  first  tasks  is  to  a>, 
quire  some  information  on  these  two  problems. 

In  regard  to  need  for  services,  an  attempt  h; 
been  made  to  survey  the  field  by  means  of  a  sho: ! 
questionnaire  sent  initially  to  all  members  of  tl 
American  Chemical  Society,  and  later,  if  this 
found  desirable,  to  other  interested  groups.     Tl 
questionnaire    will    attempt    to    ascertain    whie 
properties  of  materials  are  most  often  sought  i 
the  literature,  how  well  the  existing  literatmj 
satisfies  this  need,  which  data  compilations  aj 
most  often  consulted,  for  which  properties  conj 
pilations  need  to  be  prepared  and  data  evaluate  | 
It  will  also  try  to  discover  existing  or  incipiei 
data     compilations     undertaken     by     individtiJ 
scientists  and  not  widely  known.     This  inform:] 
tion  should  assist  OSRD  in  setting  priorities  ar 
distributing  funds  among  data  compilation  ar 
evaluation  projects ;   at  the  same  time  it  shou  j 
indicate  to  ISO  what  kind  of  demand  for  its  serJ 
ices  is  to  be  expected,  and  it  may  point  to  sonj 
existing  compilations  which  should  be  added  ! 
ISO's  library.     Preliminary  indications  are  th; 
the  number  of  obscure  compilers  to  be  discovert 
in  this  way  is  substantial. 

Another  questionnaire,  with  a  small  distribute 
list,  will  ascertain  the  characteristics  of  the  knov 
major  data  centers  and  similar  organization- 
nature,  volume,  and  format  of  the  data  they  pr 
duce,  store,  and  distribute;  policy  in  regard 
answering  inquiries ;  and  funding. 
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!  Even  with  all  these  attempts  it  is  unlikely  that 
He  full  impact  of  NSRDS  on  the  technical  com- 
unity  can  be  foreseen  by  an  information- 
athering  activity  undertaken  ahead  of  time, 
'either  the  prospective  users  nor  the  producers 
'lid  distributors  of  standard  reference  data  are 
kely  to  anticipate  the  changes  in  project  organi- 
ition  and  working  habits  which  are  potential 
!^ults  of  this  "information  revolution."  When 
1  engineer  or  scientist  can  get  information  on 
properties  of  materials  by  turning  a  dial  attached 
>  his  desk — spending  less  effort  than  walking  to 
!iis  own  bookcase  and  turning  pages  in  a  handbook, 
iid  at  the  same  time  receiving  answers  which  are 
itter  evaluated  and  more  up  to  date — his  very 
linking  will  be  directed  into  different  channels 
i  ways  which  we  cannot  foresee  at  present. 
'j/hen  electronic  computers  were  first  contem- 
plated, it  was  everyone's  conviction  that  one  na- 
lonal  computing  center,  or  at  most  half  a  dozen 
?gional  centers,  would  satisfy  all  anticipated 
imputing  needs;  and  no  survey  of  prospective 
sers  could  have  changed  the  picture.  Similarly, 
1  order  to  insure  full  utilization  of  the  potential 
Clients  of  NSRDS,  it  will  be  necessary  to  obtain 
continuous  feedback — evaluation,  complaints, 
nd  new  ideas  from  the  users  of  the  system ;  and  it 
ill  be  necessary  for  the  management  of  OSRD  to 
ontinue  to  look,  on  its  own  initiative,  for  better 
aethods  and  new  applications. 


3.4.2.  Bibliography 


8  A  major  bibliographic  survey  of  existing  com- 
ilations   of  quantitative  data  on  physical   and 
lemical    properties  of  materials  is  being  under- 
iken.    It  is  expected  that  this  survey  will  cover 
'■  11  existing  compilations  of  data  in  the  areas  out- 
ned  above,  whether  published  in  the  open  litera- 
ire  or  in  report  form ;  as  far  as  possible  it  should 
lso  include  unpublished  manuscript  compilations, 
n  general,  the  subject  of  the  survey  would  be  sec- 
ndary  publications  or  manuscript  collections,  not 
ie  primary  publications  in  which  newly  measured 
r  calculated   data  are  first  communicated.     A 
ood  example  of  such  a  survey  is  furnished  by  the 
Index  of  Mathematical  Tables"  by  A.  Fletcher, 
.  C.  P.  Miller,  and  A.  Rosenhead.    This  index  is 
listing  of  all  mathematical  tables  whose  existence 
he  authors  were  able  to  ascertain.     In  the  few 
ears  of  its  existence  it  has  become  an  indispens- 
able reference  work.     It  appears  that  the  proposed 
urvey  of  data  compilations  should  be  published 
i  similar  form. 
It  seems  that  the  survey  can  best  be  undertaken 
y  a  joint  effort  of  perhaps  six  to  twelve  leading 
dentists,  each  a  recognized  authority  in  one  of 
'he  major  subdivisions  of  physics  and  familiar 
i?ith  all  of  the  important  people  and  projects  in 
'hat  subdivision.     Each  of  them  will  be  respon- 
sible for  one  "chapter"  of  the  entire  compilation, 
'liere  will  be  one  central  coordinator  in  charge  of 
evp  !he  entire  project,  whose  job  it  will  be  to  recruit 
hapter  authors,  delineate  their  fields  of  responsi- 
'ility  and  set  common  standards  for  the  chapters. 
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The  main  responsibility  of  the  chapter  authors 
will  be  to  know  all  the  likely  sources  of  compila- 
tions in  their  fields;  the  job  of  actually  contacting 
these  sources,  verifying,  and  describing  the  extent 
of  existing  tabulations  can  be  left  to  subordinates. 
Some  of  the  chapter  authors  may,  however,  find 
that  their  fields  are  so  large  that  they  have  to  be 
subdivided  into  a  number  of  smaller  specialized 
areas,  each  with  a  separate  "section  author"  who 
would  again  have  to  be  a  recognized  authority  in 
his  field  of  specialization  and  conversant  with  all 
data  compilation  activities  in  that  field.  Ex- 
amples of  chapter  areas  might  be:  nuclear  struc- 
ture data,  infrared  spectra,  x-ray  diffraction  pat- 
terns, other  solid  state  properties,  etc.  These  ex- 
amples are  illustrative  only ;  it  should  be  left  to  the 
judgment  of  the  coordinator  to  delimit  the 
chapters. 

The  survey  will  not  have  to  start  entirely  from 
scratch.  The  Office  of  Critical  Tables  of  the 
NAS-NRC  has  collected  a  list  of  some  of  the  best 
known  data  compilations,  and  is  continuing  this 
work  on  a  small  scale.  OSRD  has  on  its  own  as- 
sembled a  modest  collection  of  compilations. 
These  two  collections  could  be  used  as  starting 
points.  There  also  exist  surveys  in  some  special- 
ized fields:  for  instance,  the  Nuclear  Data  Project 
at  Oak  Ridge  has  made  a  survey  of  compilations 
of  nuclear  data ;  it  is  not  yet  complete,  but  the  work 
is  being  continued. 

The  publication  of  such  a  survey,  apart  from  its 
importance  for  the  National  Standard  Reference 
Data  System,  would  be  a  most  valuable  addition 
to  the  technical  literature  and  extremely  useful  to 
the  scientific  and  technical  community. 

3.4.3.  Classification 

The  most  important  preparation  for  the  opera- 
tion of  ISO,  apart  from  gathering  the  information 
on  which  it  is  based,  is  the  organization  of  this  in- 
formation. The  bibliographic  survey  and  ques- 
tionnaires discussed  in  the  previous  section  will 
result  in  a  list  of  data  compilations.  Copies  of 
these  will  be  acquired  by  ISO  (many  of  them  are 
undoubtedly  already  in  their  collection)  to  be  used 
in  their  inquiry-answering  service.  It  now  be- 
comes mandatory  to  devise  a  system  of  organizing 
the  collection  in  such  a  way  that  any  desired  item 
in  it  can  be  quickly  located.  This  problem  occurs 
in  every  library,  and  conventional  solutions  are  the 
first  to  be  considered. 

Books  or  documents  in  a  library  are  located 
primarily  by  two  devices :  systematic  classification 
and  subject  indexing.  These  two  approaches  are 
discussed  in  this  and  the  next  section. 

Classification  begins  with  the  selection  of  a  clas- 
sification system.  It  has  been  estimated  that  a 
substantial  part  of  the  literature  in  the  physical 
sciences — probably  between  20  and  50  percent  of 
all  papers  published — is  concerned  with  data  on 
properties  of  materials.  This  suggests  that  one 
should  start  by  looking  at  existing  classification 
systems  for  the  physical  sciences  as  a  whole.  Sev- 
eral such  systems  are  in  widespread  use :  Universal 

21 


Decimal  (UDC),  Library  of  Congress  (LC),  a 
system  used  by  Physics  Abstracts,  etc.  It  was  de- 
cided at  an  early  stage  that  none  of  those  could 
be  used  without  change,  mostly  because  they  are 
obsolete.  It  therefore  became  desirable  to  design 
a  new  classification  system,  made  specially  for  the 
needs  of  NSRDS,  and  hope  that  it  could  somehow 
be  made  compatible  with  the  older  and  broader 
systems.  The  price  we  pay  for  having  our  own 
system  is  that  we  have  to  do  all  the  classifying, 
while  with  a  general-purpose  system  one  might 
have  hoped  to  leave  this  job  in  many  instances  to 
others.  This  is  a  small  effort  for  the  present  col- 
lection of  compilations,  but  a  much  larger  one  for 
the  current  literature. 

Since  the  proposed  classification  system  is  to 
serve  the  needs  of  a  large  segment  of  the  scientific 
community,  it  would  be  desirable  to  have  it  agreed 
upon  by  general  consensus  and  in  cooperation  with 
the  scientific  societies;  indeed,  international  uni- 
formity would  be  most  welcome.  This,  however, 
would  take  years  to  achieve.  It  therefore  be- 
comes expedient  to  proceed  in  two  directions 
simultaneously;  toward  the  long-range  goal  of  a 
broadly  based,  cooperatively  designed  classifica- 
tion which  can  be  used  on  a  large  part  of  the 
literature  and  is  convertible  to  the  older  classifica- 
tions to  a  reasonable  degree;  and  toward  a  short- 
term  objective  of  a  classification  adequate  for  the 
internal  operation  of  ISO  during  the  next  few 
years. 

In  pursuit  of  the  long-range  solution,  OSRD 
sponsored  two  pilot  efforts.  One  proceeded  em- 
pirically, in  line  with  modern  trends  in  docu- 
mentation theory,  by  collecting  statistics  on  such 
features  as  overlap  in  the  vocabulary  of  pairs  of 
documents,  and  attempting  to  derive  clusters  of 
documents  which  ought  to  have  fallen  into  the 
same  category  of  the  sought-after  classification. 
On  completion  of  this  study  the  results  were  not 
judged  to  be  sufficiently  promising  to  warrant 
continued  effort.  The  second  long-range  study 
employed  the  conventional  approach  of  selecting 
prominent  technical  attributes  as  a  basis  for  classi- 
fication, but  differed  from  older  attempts  by  its 
use  of  the  most  modern  concepts  of  theoretical 
physics.  Further  development  along  these  lines 
appeared  promising  but  would  have  required  more 
effort  than  it  was  possible  at  the  time  to  devote  to 
this  part  of  the  program.  At  the  same  time,  some 
features  of  this  approach  proved  adaptable  to  the 
short-term  study  being  conducted  in  parallel. 

This  short-term  effort  had  started  while  the  two 
experimental  long-range  studies  were  in  progress, 
and  was  completed  in  the  main,  except  for  some 
details,  a  few  months  later.  It  resulted  in  a  classi- 
fication of  properties  developed  by  OSRD  which 
is  currently  being  used  in  the  operation  of  the 
ISO  data  file.  It  is  strictly  limited  to  physical 
properties  of  materials;  it  avoids  using  as  a  basis 
for  any  stage  of  the  classification  either  materials, 
groups  of  materials  or  any  other  concept.  (For 
instance,  "thermal  conductivity"  appears  as  a  cate- 
gory, but  "thermal  conductivity  of  aluminum,"  or 
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".  .  .  of  metals"  or  ".  .  .  at  low  temperatures"  an 
not  categories.)  It  consists  of  a  few  hundrec 
classes,  which  have  proved  to  be  amply  adequate 
for  classifying  the  present  library  of  ISO  and  wil 
undoubtedly  be  adequate  for  any  foreseeable  ex 
pansion  of  it.  If  it  were  used,  e.g.,  for  classifyin; 
all  scientific  papers  dealing  with  properties  of  ma 
terials,  there  would  probably  appear  some  unman 
ageably  large  classes,  which  would  require  furthe 
refinement  of  the  classification. 

Once  the  classification  had  been  designed,  th 
next  step  was  to  assign  the  volumes  of  the  IS( 
library  to  classes.  This  step  has  been  complete! 
for  the  present  holdings  but  will  have  to  be  con 
tinued  for  future  acquisitions.  It  is  the  success  o 
this  operation  which  constitutes  the  "proof  of  th 
pudding"  for  the  classification  system  adopted. 

The  library  contains  a  certain  number  of  docu 
ments  which  are  not  primarily  tables  of  data 
These  fall  outside  the  classification  used.  The; 
are  kept  separately  from  the  main  collection  ant 
are  arranged  in  groups  according  to  a  simple  ai 
hoc  classification  designed  for  the  purpose. 

Finally,  the  books  are  shelved  in  accordanc 
with  the  classification.  This  step  would  not  b 
entirely  necessary — conceivably  the  books  could  b 
kept  in  any  order,  e.g.,  by  accession  number,  an 
a  card  file  be  used  to  locate  the  volumes  bearing 
desired  classification  number — but  for  the  manua 
operation  of  a  small  library  such  as  ISO's  shelvin 
by  classification  number  has  a  number  of  well 
recognized  advantages. 

3.4.4.  Indexing  and  Abstracting 

Conventional  libraries  rely  on  author  and  sut 
ject  indexing,  in  addition  to  classification,  as 
principal  means  of  retrieving  information. 

The  preparation  of  an  index  card  file  arrange 
by  authors,  separately  for  personal  and  corporal 
authors,  is  straightforward.  For  subject  indexin 
there  are  two  methods:  one  uses  derived  inde 
terms,  the  other  uses  assigned  ones. 

Derived  index  terms  are  a  recent  product  ( 
modern  documentation  theory.  They  are  usuall 
based  on  statistical  analyses  of  the  vocabulary  i 
the  document.  They  have  the  virtue  that  they  ca 
be  produced  by  computers,  or  at  least  by  unskille 
personnel.  These  methods  are  still  being  deve 
oped,  and  have  not  been  shown  to  be  clearly  su 
cessful.  There  is  no  more  reason — in  fact,  prol 
ably  less  reason — to  expect  them  to  be  successfi 
in  the  field  of  data  on  properties  of  materials  tha 
in  other  fields. 

Assigned  index  terms  for  a  document  are  chos* 
by  a  person  who  has  at  least  scanned  the  documen 
is  at  least  superficially  familiar  with  the  subje 
matter,  and  makes  a  decision  as  to  the  subje 
headings  under  which  a  user  might  expect  to  loc 
for  this  document.  There  are  two  systems  f< 
doing  this.  In  one  the  indexer  (who  in  this  ca 
is  often  the  author  himself)  is  free  to  choose  ar 
terms  that  occur  to  him.  In  the  other,  terms  a 
taken  from  a  master  list  or  "thesaurus"  if  possibl 
If  the  thesaurus  contains  no  suitable  term,  tl 
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. ndexer  may  assign  a  new  term  and  add  it  to  the 
;f    'thesaurus. 

This  system,  then,  requires  two  steps:  first,  the 
1  building  up  of  a  thesaurus,  and  second,  the  appli- 
:ation  of  this  to  a  given  set  of  documents.     The 
thesaurus  should  be  generously  cross-referenced 
for  synonyms  and  for  inclusion  relations   ("see" 
and  "see  also"   references).     The   difference   be- 
:ween  the  two  systems  is  smaller  than  might  ap- 
pear :  a  thesaurus-controlled  approach  can  be  han- 
dled so  liberally  that  the  indexer  is  in  fact  free  to 
use  any  term  that  occurs  to  him,  and  the  uncon- 
trolled approach  can  be  augmented  by  maintaining 
•an  alphabetical  listing  of  all  index  terms  used. 
There  remains  the  difference  that  in  the  thesaurus 
approach  the  indexer  has  the  responsibility  to 
search,  before  using  any  term,  for  synonymous, 
more  general,  and  more  special  terms  already  oc- 
curring in  the  thesaurus.     This  task  is  facilitated 
if  the  thesaurus  is  maintained  not  only  as  an  alpha- 
betical listing  but  also  in  a  systematic  hierarchical 
3pk   Arrangement — thus  forming  a  bridge  between  in- 
dexing and  classification. 

For  reasons  which  can  only  be  partly  detailed 
nere,  we  believe  that  a  thesaurus-controlled  ap- 
proach to  indexing  should  be  taken  by  ISO.    Fur- 
thermore, a  quite  small  thesaurus  would  be  ade- 
quate for  most  purposes.     In  many  instances  the 
classification  according  to  physical  properties  will 
ieh  jalone  be  sufficient  to  locate  desired  items  of  infor- 
-mation,  without  using  the  index  at  all.    Indexing 
by  materials  would  be  useful  but,  because  of  the 
large  number  of  possible  terms,  too  cumbersome  at 
least  in  the  beginning.    It  is  suggested  that  a  small 
thesaurus  be  put  together  from  the  names  of  com- 
jinon  classes  of  materials   (e.g.,  acids,  oxides,  al- 
2ohols,  cyclic  compounds),  common  designations 
TjDj  }of  parameter  ranges  or  values  (e.g.,  low-tempera- 
rpoi  jture,   high-pressure,   critical),   and   a   few   other 
ados  iterms  expected  to  be  useful  (e.g.,  catalysts,  refrac- 
tories, dielectrics).     The  existing  library  of  ISO 
•should  then  be  tentatively  indexed  on  these  terms. 
This  process  will  result  in  suggestions  for  addi- 
tional index  terms,  which  should  be  added  to  the 
thesaurus  and  used  in  a  second  round  of  indexing 
the  collection. 
•i!  ,     A  further  step  in  the  intellectual  organization  of 
|fi  •:  the  ISO  library,  after  classification  and  indexing, 
:]vi  -[consists  in  abstracting.     In  other  environments 
'pi  »the  value  of  abstracts  lies  in  the  wide  circulation 


which  they  can  be  given.  In  the  case  of  ISO,  with 
its  tightly  knit  organization,  this  is  a  minor  ad- 
vantage ;  it  is  almost  as  easy  for  the  staff  to  work 
with  the  documents  themselves  as  with  a  set  of 
abstracts  cards.  Perhaps  the  chief  gain  accruing 
from  abstracting  is  that  the  process  will  systemat- 
ically familiarize  the  staff  with  the  contents  of  the 
library. 

The  information  to  be  put  on  the  abstract  card 
for  a  document  comes  under  the  headings  of 
properties,  materials,  parameters,  and  other  in- 
formation. The  card  should  enumerate  all  prop- 
erties of  materials  on  which  the  document  contains 
data.  On  the  other  hand,  a  complete  enumera- 
tion of  all  materials  referred  to  in  the  document  is 
probably  impractical;  it  will  be  advisable  to  list 
only  major  classes  of  materials  (e.g.,  gases,  metals, 
hydrocarbons ) .  As  for  parameters,  it  will  usually 
be  sufficient  to  list  the  largest  and  smallest  value 
of  each  parameter  (temperature,  pressure,  etc.) 
for  which  the  document  gives  data;  to  indicate 
other  information  or  parameter  values,  such  as  the 
intervals  at  which  functions  are  tabulated,  would 
probably  be  superfluous  detail.  Finally,  under 
"other  information"  the  abstract  card  might  list 
applications  of  the  materials,  instrumentation  of 
measurement,  any  theoretical  discussion  given  in 
the  document,  evaluation  of  the  quality  of  data, 
etc. 

In  summary :  When  a  request  for  data  is  re- 
ceived, the  searcher  will  first  ascertain  which 
properties  of  materials  are  involved,  and  will  find 
these  properties  in  the  hierarchical  classification. 
This  indicates  a  small  group  of  documents  in 
which  the  desired  information  should  be  looked 
for.  He  next  consults  the  evaluations  file,  which 
may  point  to  some  data  compilations  of  high 
quality.  Along  with  specialized  compilations,  the 
large  general  data  compilations,  notably  Landolt- 
Boernstein,  must  be  examined.  In  doubtful 
cases,  or  when  the  request  contains  important 
qualifications  which  would  narrow  the  search,  the 
subject  index  can  be  consulted,  which  may  reduce 
the  number  of  documents  to  be  searched.  If 
desired,  the  abstract  cards  for  these  documents 
are  looked  up,  and  this  may  exclude  further  docu- 
ments from  the  search.  The  remaining  docu- 
ments are  then  consulted  in  order  to  locate  the  de- 
sired information. 


4.  Conclusions 


sub 
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It  emerges  from  the  foregoing  pages  that  during 
the  next  few  years  OSRD  should  vigorously  pur- 
sue several  activities  in  parallel.  The  availability 
of  information  services  should  be  announced, 
questioners  should  be  given  the  best  service  pos- 
sible with  the  present  manpower  and  information 
resources  of  ISO,  all  personnel  of  ISO  should 
participate  in  this  service  and  view  it  as  an  oppor- 
tunity to  become  familiar  with  their  subject.     A 


large,  but  gradually  decreasing,  fraction  of  in- 
quiries will  have  to  be  referred  to  experts  chosen 
from  area  managers,  data  centers,  NBS  divisions 
and  occasionally  others.  Where  replies  are  pre- 
pared by  ISO,  all  but  administrative  or  routine 
ones  should  be  checked  by  area  managers  or  other 
NBS  scientists.  Replies  should  not  be  limited  to 
the  furnishing  of  standard  reference  data  but 
would,  in  the  absence  of  such  data,  supply  litera- 
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ture  references  or  data  taken  from  the  literature, 
accompanied  by  a  suitable  disclaimer.  If,  how- 
ever, the  volume  of  inquiries  grows  too  fast  for  the 
present  staff  to  handle,  and  if  an  expansion  of  the 
staff  at  that  time  is  not  feasible,  then  the  flow  of 
information  would  have  to  be  restricted  by  limit- 
ing information  services  to  those  cases  for  which 
standard  reference  data  are  available. 

In  parallel  with  the  foregoing,  ISO  should  de- 
velop a  thesaurus  of  subject  index  terms  for  its  col- 
lection, apply  it  to  the  collection,  and  prepare 
abstracts  of  documents.  Simultaneously  it  should 
broaden  its  collection  in  accordance  with  the 
results  of  the  bibliographic  survey  of  data-  com- 
pilations now  started. 

Further  exploration  is  needed  of  the  use  of  cita- 
tion indexing  and  bibliographic  coupling  in  litera- 
ture searching;  use  of  computers  in  editing  and 
publishing ;  remote  access  to  computers. 

With  these  measures  OSRD  ought  to  operate 
satisfactorily  for  a  few  years;  and  meanwhile 
more  information  would  accumulate  on  which  the 
transition  to  mechanized  operation  could  be  based. 

One  other  approach,  however,  appears  desirable 
and  should  be  undertaken  simultaneously  with  all 
of  the  above.  Rather  than  wait  for  several  years 
and  then  transfer  the  entire  operation  to  a  com- 


puter in  one  move,  one  should  select  a  segment  oi  || 
the  operation  which  could  be  computerized  before : 
the  rest,  to  serve  as  a  proving  ground.     For  several  \ 
reasons,  the   area   of  thermodynamic   properties 
appears  to  be  an  excellent  candidate  for  this  role 
though  atomic  spectra  or  crystal  data  could  be 
considered  alternatively.     In  these  areas  there  is: 
a  body  of  data  already  in  machine-readable  forrr 
or  now  being  recorded  in  this  form ;  and  severa" 
data  centers  outside  NBS,  as  well  as  competent 
scientists  within  NBS,  have  shown  interest  in  suclj| 
a  development. 

In  the  near  future  we  propose  to  identify  3 1 
subset  of  data  to  serve  as  a  basis  for  the  develop- 
ment  of  a  set  of  computer  codes  for  retrieval  anc ! 
updating.  After  such  codes  are  developed—; 
which  will  take  a  good  deal  of  time — they  should 
be  used  for  six  to  twelve  months  in  parallel  witll 
manual  methods.  Thereafter  their  use  could  b(! 
extended  to  the  entire  technical  area  of  which  the 
pilot  study  was  a  prototype,  and  a  little  later  the  j 
manual  operation  for  this  area  could  be  disconi 
tinued.  Only  then  would  the  time  have  come  tcj 
look  for  the  memory  component  and  other  special  I 
computer  features  needed  in  the  eventual  mech 
anized  operation. 
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